feat: implement basic tool system (registry, invocation, results) (#15) by Aureliolo · Pull Request #104 · Aureliolo/synthorg

Aureliolo · 2026-03-05T16:04:47Z

Summary

Implements the application-level tool system that bridges LLM ToolCall objects with actual tool execution (Issue #15, Design Spec §11.1-11.2).

Error hierarchy: ToolError → ToolNotFoundError, ToolParameterError, ToolExecutionError with immutable MappingProxyType context
BaseTool ABC: Abstract base with ToolExecutionResult frozen Pydantic model, deep-copy schema protection, to_definition() for LLM integration
ToolRegistry: Immutable MappingProxyType-backed registry with duplicate rejection, sorted tool listing, and to_definitions() for provider layer
ToolInvoker: Validates parameters via JSON Schema (with SSRF-safe $ref resolution), executes tools, never propagates exceptions — all errors become ToolResult(is_error=True)
EchoTool: Reference implementation demonstrating the tool contract
10 observability event constants for structured logging across the tool system
jsonschema==4.26.0 added as explicit dependency (was only transitive via litellm)

Key Design Decisions

ToolInvoker uses referencing.Registry with a custom retrieve function that raises NoSuchResource to block remote $ref resolution (SSRF prevention)
except MemoryError, RecursionError: raise before except Exception ensures non-recoverable errors propagate (PEP 758 syntax)
BaseTool.parameters_schema property returns copy.deepcopy() to prevent external mutation
invoke_all() executes sequentially (parallel execution deferred to future iteration)

Test Plan

65 unit tests covering errors, base, registry, invoker, and echo tool
All 1665 tests pass (pytest tests/ -n auto)
95.06% overall coverage (80% minimum enforced)
mypy strict passes on all source and test files
ruff lint + format clean

Review Coverage

Pre-reviewed by 8 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, security-reviewer). 17 findings addressed:

SSRF prevention in JSON Schema validation
SchemaError vs ValidationError separation
MemoryError/RecursionError re-raise
Schema property deep copy protection
Log level corrections (warning vs error)
Soft error propagation path coverage
Unhashable type containment check
Empty error message fallback

Closes #15

🤖 Generated with Claude Code

Adds the application-level tool system that bridges LLM tool calls with actual tool execution: - ToolError hierarchy (ToolNotFoundError, ToolParameterError, ToolExecutionError) - BaseTool ABC with ToolExecutionResult frozen model - ToolRegistry with immutable MappingProxyType backing - ToolInvoker with JSON Schema validation and SSRF-safe $ref resolution - EchoTool reference implementation - 65 unit tests with near-100% coverage on new modules Pre-reviewed by 8 agents, 17 findings addressed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-05T16:05:00Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

Package	Version	Score	Details
pip/jsonschema	4.26.0	Unknown	Unknown

Scanned Files

pyproject.toml

coderabbitai · 2026-03-05T16:05:05Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7df468d-61ac-4dc0-9b48-bbd01a009d01

📥 Commits

Reviewing files that changed from the base of the PR and between e3edf25 and 88bc06f.

📒 Files selected for processing (3)

src/ai_company/observability/events.py
src/ai_company/tools/invoker.py
src/ai_company/tools/registry.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Comprehensive tool framework: register, validate, and invoke tools with lifecycle events and an example echo tool.
Error Handling
- Structured tool error model with immutable context and clear error types; consistent error/results mapping from invocations.
Integration
- Tool invocation engine supporting single and sequential calls, parameter validation, and robust handling of execution outcomes.
Tests
- Extensive unit test suite covering registry, invoker, tools, and error behaviors.
Chores
- Added jsonschema dependency and a type-check override for it.

Walkthrough

Adds a new tool subsystem: immutable ToolExecutionResult and BaseTool, a ToolError hierarchy, an immutable ToolRegistry, a ToolInvoker with JSON Schema validation and SSRF blocking, an EchoTool example, observability event constants, jsonschema dependency, and comprehensive unit tests and fixtures.

Changes

Cohort / File(s)	Summary
Project metadata & observability `pyproject.toml`, `src/ai_company/observability/events.py`	Added `jsonschema==4.26.0` and mypy ignore for `jsonschema.`. Introduced tool lifecycle event constants (`TOOL_REGISTRY_BUILT`, `TOOL_REGISTRY_DUPLICATE`, `TOOL_NOT_FOUND`, `TOOL_INVOKE_`, `TOOL_BASE_INVALID_NAME`, `TOOL_REGISTRY_CONTAINS_TYPE_ERROR`).
Core types & exports `src/ai_company/tools/__init__.py`, `src/ai_company/tools/base.py`	Added `ToolExecutionResult` (frozen model) and `BaseTool` ABC with name/description/parameters_schema and `to_definition()`/abstract `execute()`. Re-exported core tool symbols in package `__all__`.
Errors `src/ai_company/tools/errors.py`	New `ToolError` base class storing an immutable context and subclasses `ToolNotFoundError`, `ToolParameterError`, `ToolExecutionError`.
Registry `src/ai_company/tools/registry.py`	New immutable `ToolRegistry` built from Iterable[BaseTool]; rejects duplicates, logs events, provides `get()`, `list_tools()`, `to_definitions()`, `__contains__()` with type checks, and `__len__()`. Raises `ToolNotFoundError` for missing tools.
Invoker `src/ai_company/tools/invoker.py`	New `ToolInvoker` with `invoke()` and `invoke_all()`; looks up tools, validates arguments against JSON Schema with remote-$ref blocking, executes tools, maps exceptions to `ToolResult` (re-raises non-recoverable errors), and emits structured logs/events.
Example tool `src/ai_company/tools/examples/echo.py`	Added `EchoTool` implementing `BaseTool` with a required `message` parameter and async `execute()` that echoes the message.
Tests & fixtures `tests/unit/tools/conftest.py`, `tests/unit/tools/test_base.py`, `tests/unit/tools/test_echo.py`, `tests/unit/tools/test_errors.py`, `tests/unit/tools/test_invoker.py`, `tests/unit/tools/test_registry.py`	Added extensive fixtures and unit tests covering base types, error hierarchy, registry behaviors (duplicates, immutability, lookups), invoker flows (validation, not-found, execution errors, SSRF protection, non-recoverable errors), and EchoTool integration.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Invoker as ToolInvoker
    participant Registry as ToolRegistry
    participant Tool as BaseTool

    Client->>Invoker: invoke(tool_call)
    Invoker->>Registry: get(tool_name)
    alt tool found
        Registry-->>Invoker: BaseTool
        Invoker->>Invoker: validate(arguments, schema) (jsonschema, no-remote-$ref)
        alt validation OK
            Invoker->>Tool: execute(arguments)
            Tool-->>Invoker: ToolExecutionResult
            Invoker-->>Client: ToolResult (is_error: false)
        else validation failed
            Invoker-->>Client: ToolResult (is_error: true, validation error)
        end
    else not found
        Registry-->>Invoker: raises ToolNotFoundError
        Invoker-->>Client: ToolResult (is_error: true, not found)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: design unified provider interface #86: Adds provider-layer models (ToolDefinition, ToolCall, ToolResult) that the new registry/invoker and tests import and use.
feat: wire all modules into observability system #97: Modifies observability event constants overlapping src/ai_company/observability/events.py.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.23% which is insufficient. The required threshold is 100.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: implement basic tool system (registry, invocation, results)' accurately summarizes the main change: a comprehensive tool system with registry, invoker, and result handling.
Description check	✅ Passed	The pull request description provides detailed context about the tool system implementation, design decisions, test coverage, and links to Issue `#15`, directly related to the changeset.
Linked Issues check	✅ Passed	All acceptance criteria from Issue `#15` are met: tool base class (BaseTool ABC), registry with register/lookup/list (ToolRegistry), invoker with validation (ToolInvoker), structured result format (ToolExecutionResult, ToolResult), JSON Schema support, EchoTool example, 65+ unit tests, and error handling.
Out of Scope Changes check	✅ Passed	All changes directly support Issue `#15` requirements: error hierarchy, base tool abstraction, registry, invoker, observability events, and dependencies. Test fixtures and example implementations are in-scope for the tool system.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/tool-system

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-05T16:05:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational tool system that enables the application to effectively manage and execute tools based on LLM requests. It establishes clear abstractions for tools, robust error handling, and secure parameter validation, providing a reliable and extensible framework for integrating external functionalities.

Highlights

Tool System Implementation: Implemented the core application-level tool system, bridging LLM ToolCall objects with actual tool execution, as outlined in Issue Design and implement basic tool system (registry, invocation, results) #15 and Design Spec §11.1-11.2.
Robust Error Handling: Introduced a comprehensive error hierarchy (ToolError -> ToolNotFoundError, ToolParameterError, ToolExecutionError) with immutable MappingProxyType context for structured metadata.
BaseTool Abstraction: Defined BaseTool as an Abstract Base Class (ABC) with a ToolExecutionResult frozen Pydantic model, deep-copy schema protection, and a to_definition() method for LLM integration.
Tool Registry and Invoker: Developed ToolRegistry for immutable, duplicate-rejecting tool storage and ToolInvoker for validating parameters via JSON Schema (with SSRF-safe $ref resolution) and executing tools, ensuring errors are always returned as ToolResult(is_error=True).
Observability and Dependencies: Added 10 new observability event constants for structured logging within the tool system and explicitly included jsonschema==4.26.0 as a direct dependency.
Security and Stability: Implemented SSRF prevention in JSON Schema validation and ensured non-recoverable errors like MemoryError and RecursionError propagate, rather than being caught by the invoker's general exception handling.

Changelog

pyproject.toml
- Added jsonschema==4.26.0 to the project's dependencies.
- Configured MyPy to ignore missing imports for the jsonschema module.
src/ai_company/observability/events.py
- Introduced 10 new Final[str] constants for logging events related to the tool lifecycle, including registry operations, invocation start/success, and various error types.
src/ai_company/tools/init.py
- Created the __init__.py file to define the tools package.
- Exported BaseTool, ToolExecutionResult, ToolError, ToolExecutionError, ToolNotFoundError, ToolParameterError, ToolInvoker, ToolRegistry, and EchoTool for public access.
src/ai_company/tools/base.py
- Added ToolExecutionResult Pydantic model to standardize tool output, including content, error status, and optional metadata.
- Implemented BaseTool as an abstract base class for all tools, enforcing execute method implementation.
- Ensured tool parameters_schema is deep-copied on initialization and when accessed via property to prevent external mutation.
src/ai_company/tools/errors.py
- Created a hierarchy of custom exceptions for tool-related errors: ToolError (base), ToolNotFoundError, ToolParameterError, and ToolExecutionError.
- Designed ToolError to carry an immutable MappingProxyType context for structured error details.
src/ai_company/tools/examples/echo.py
- Added EchoTool as a reference implementation of BaseTool, demonstrating how to define a tool with a fixed schema and execute method.
src/ai_company/tools/invoker.py
- Implemented ToolInvoker responsible for looking up tools, validating arguments against JSON Schema, and executing tool methods.
- Integrated a custom _no_remote_retrieve function with referencing.Registry to prevent Server-Side Request Forgery (SSRF) during JSON Schema $ref resolution.
- Ensured ToolInvoker catches all tool-related exceptions and returns them as ToolResult(is_error=True), preventing exceptions from propagating.
- Added specific logging events for various stages and outcomes of tool invocation.
src/ai_company/tools/registry.py
- Implemented ToolRegistry as an immutable container for BaseTool instances, preventing duplicate tool names.
- Provided methods for retrieving tools by name, listing registered tool names, and converting tools to ToolDefinition objects for LLM providers.
- Integrated logging for registry build and duplicate tool warnings.
tests/unit/tools/conftest.py
- Added a new conftest.py file to provide pytest fixtures for the tool system.
- Defined concrete test tool implementations (_EchoTestTool, _FailingTool, _NoSchemaTool, _StrictSchemaTool, _SoftErrorTool) for comprehensive testing.
- Provided fixtures for ToolRegistry and ToolInvoker instances pre-populated with test tools.
tests/unit/tools/test_base.py
- Added unit tests for ToolExecutionResult to verify its immutability and data handling.
- Implemented unit tests for BaseTool to confirm property behavior, name validation, schema deep-copying, and to_definition conversion.
tests/unit/tools/test_echo.py
- Added unit tests for EchoTool to verify its properties, execution logic, and integration within the full tool pipeline (registry -> invoker -> tool call).
tests/unit/tools/test_errors.py
- Added unit tests for the ToolError hierarchy, verifying message and context storage, string representation, and proper subclassing.
tests/unit/tools/test_invoker.py
- Added extensive unit tests for ToolInvoker covering successful invocations, handling of not-found tools, parameter validation (valid, invalid, missing, extra parameters), soft errors, and execution errors.
- Included tests for the invoke_all method to ensure correct sequential execution and error handling for multiple tool calls.
tests/unit/tools/test_registry.py
- Added unit tests for ToolRegistry to verify its behavior with empty, single, and multiple tool registrations.
- Confirmed correct handling of duplicate tool names, tool lookup, and conversion to ToolDefinition objects.
- Tested the immutability of the registry's internal tool mapping.
uv.lock
- Updated the uv.lock file to include jsonschema as a dependency, ensuring consistent environment builds.

Activity

The pull request implements the application-level tool system, addressing Issue Design and implement basic tool system (registry, invocation, results) #15 and following Design Spec §11.1-11.2.
The author performed a comprehensive test plan, including 65 unit tests covering various components of the tool system.
All 1665 existing tests passed, and the overall code coverage reached 95.06%, exceeding the 80% minimum.
MyPy strict type checking passed on all source and test files, and Ruff linting and formatting checks were clean.
The changes underwent pre-review by 8 different AI agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, security-reviewer).
17 findings from the pre-review were addressed, including SSRF prevention, error separation, memory/recursion error re-raise, schema property deep copy protection, log level corrections, soft error propagation, unhashable type containment, and empty error message fallbacks.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

greptile-apps · 2026-03-05T16:08:32Z

Greptile Summary

This PR implements the complete application-level tool system (Issue #15, Design Spec §11.1–11.2), bridging LLM ToolCall objects with concrete tool execution through a clean four-layer stack: ToolError hierarchy → BaseTool ABC → ToolRegistry → ToolInvoker. The implementation is solid and production-ready with thorough test coverage (65 tests, 95% coverage), SSRF-safe JSON Schema validation via a blocking referencing.Registry, and a well-reasoned non-recoverable error propagation contract.

Three previously flagged issues are fully resolved: to_definition() now uses the deep-copy parameters_schema property; _validate_params correctly re-raises MemoryError/RecursionError; and the TOOL_INVOKE_VALIDATION_UNEXPECTED constant correctly separates unexpected validation failures from SchemaError in the observability pipeline.
PEP 758 convention: Both except (MemoryError, RecursionError) as exc: clauses in invoker.py (lines 157, 240) use the parenthesized form. CLAUDE.md mandates except A, B: without parentheses for Python 3.14.
logger.exception in helper methods: _schema_error_result and _unexpected_validation_result call logger.exception() while relying on implicit exception-context propagation from their callers' except blocks. This works today, but passing the live exception explicitly (or using logger.error(..., exc_info=exc)) would make the dependency explicit and prevent silent misuse if these helpers are ever called outside an except handler.

Confidence Score: 4/5

Safe to merge; the two style findings are minor and do not affect correctness or runtime behavior.
The core logic is correct — SSRF prevention is tested and functional, non-recoverable errors propagate as documented, and the deep-copy schema protection is consistently applied. The only deductions are two instances of the parenthesized except syntax (CLAUDE.md convention violation) and a fragile implicit-exception-context pattern in the logging helper methods, neither of which causes a runtime bug.
src/ai_company/tools/invoker.py — lines 157, 176–181, and 240 warrant the small clean-ups described in comments.

Important Files Changed

Filename	Overview
src/ai_company/tools/invoker.py	Core invocation engine with SSRF-safe JSON Schema validation, non-recoverable error re-raise, and structured error results; two minor style issues (parenthesized except syntax, fragile logger.exception in helpers).
src/ai_company/tools/base.py	Clean ABC with deep-copy schema protection on both the property and to_definition(); frozen ToolExecutionResult model with acknowledged shallow-copy metadata caveat.
src/ai_company/tools/registry.py	Immutable MappingProxyType-backed registry with duplicate rejection, sorted listing, and safe contains type guard; no issues found.
src/ai_company/tools/errors.py	Well-structured error hierarchy with immutable MappingProxyType context and clear str formatting; no issues found.
src/ai_company/observability/events.py	Adds 14 new Final[str] event constants for tool lifecycle covering registry, lookup, validation, execution, and non-recoverable error paths; all namespaced consistently under tool.*.
tests/unit/tools/conftest.py	Comprehensive test fixtures covering success, soft-error, hard-error, recursion, invalid schema, SSRF, and empty-message edge cases; all fixtures use vendor-agnostic names per CLAUDE.md.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM Provider
    participant TI as ToolInvoker
    participant TR as ToolRegistry
    participant BT as BaseTool
    participant JS as jsonschema (SSRF-safe)

    LLM->>TI: invoke(ToolCall)
    TI->>TR: get(tool_call.name)
    alt Tool not found
        TR-->>TI: raises ToolNotFoundError
        TI-->>LLM: ToolResult(is_error=True)
    else Tool found
        TR-->>TI: BaseTool instance
        TI->>JS: validate(arguments, schema, _SAFE_REGISTRY)
        alt SchemaError (invalid schema def)
            JS-->>TI: raises SchemaError
            TI-->>LLM: ToolResult(is_error=True)
        else ValidationError (bad arguments)
            JS-->>TI: raises ValidationError
            TI-->>LLM: ToolResult(is_error=True)
        else MemoryError / RecursionError
            JS-->>TI: raises non-recoverable
            TI-->>LLM: re-raises (propagates)
        else Unexpected exception
            JS-->>TI: raises Exception
            TI-->>LLM: ToolResult(is_error=True)
        else Validation passed
            JS-->>TI: None (success)
            TI->>BT: execute(arguments=...)
            alt Execution raises MemoryError/RecursionError
                BT-->>TI: raises non-recoverable
                TI-->>LLM: re-raises (propagates)
            else Execution raises Exception
                BT-->>TI: raises Exception
                TI-->>LLM: ToolResult(is_error=True)
            else Execution returns result
                BT-->>TI: ToolExecutionResult
                TI-->>LLM: ToolResult(content, is_error)
            end
        end
    end

_{Last reviewed commit: 88bc06f}

src/ai_company/tools/base.py

Copilot

Pull request overview

Implements the core “tool system” layer that maps provider ToolCalls to registered application tools, with JSON Schema validation, structured results, and observability events (Issue #15 / Design Spec §11.1–11.2).

Changes:

Added BaseTool + ToolExecutionResult, tool error hierarchy, and an EchoTool example implementation.
Added ToolRegistry (immutable mapping) and ToolInvoker (schema validation + safe execution returning ToolResult on failures).
Added tool-related observability event constants and pinned jsonschema==4.26.0 as a direct dependency; included unit test suite for the tool subsystem.

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
uv.lock	Adds `jsonschema` to the locked dependencies.
pyproject.toml	Pins `jsonschema==4.26.0` and adds a mypy override for `jsonschema.*`.
src/ai_company/tools/base.py	Introduces `BaseTool` abstraction and `ToolExecutionResult`.
src/ai_company/tools/errors.py	Adds `ToolError` hierarchy with immutable context mapping.
src/ai_company/tools/registry.py	Implements immutable `ToolRegistry` with lookup/list/definitions.
src/ai_company/tools/invoker.py	Implements tool invocation with JSON Schema validation and error-to-`ToolResult` conversion.
src/ai_company/tools/examples/echo.py	Adds a reference `EchoTool` implementation.
src/ai_company/tools/init.py	Exposes tool system public API surface.
src/ai_company/observability/events.py	Adds tool lifecycle event constants for structured logging.
tests/unit/tools/conftest.py	Adds fixtures and test-only tools for invoker/registry testing.
tests/unit/tools/test_base.py	Unit tests for `BaseTool` and `ToolExecutionResult`.
tests/unit/tools/test_errors.py	Unit tests for error hierarchy and context immutability.
tests/unit/tools/test_registry.py	Unit tests for registry behavior and immutability.
tests/unit/tools/test_invoker.py	Unit tests for invoker happy-path, validation, and failure modes.
tests/unit/tools/test_echo.py	Unit + integration-style tests for `EchoTool` pipeline.
tests/unit/tools/init.py	Adds tools test package marker.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T16:11:23Z

src/ai_company/tools/invoker.py

+        try:
+            result = await tool.execute(arguments=dict(tool_call.arguments))
+        except MemoryError, RecursionError:
+            raise
+        except Exception as exc:
+            error_msg = str(exc) or f"{type(exc).__name__} (no message)"


invoke() catches all Exception from tool.execute(...), which will also catch asyncio.CancelledError when it subclasses Exception (common in recent Python versions). That breaks task cancellation semantics and contradicts the module docstring claim that CancelledError propagates. Add an explicit except asyncio.CancelledError: raise (and keep it before the broad except Exception) so cancellations are not converted into ToolResult errors.

Copilot · 2026-03-05T16:11:24Z

src/ai_company/tools/registry.py

+            available = sorted(self._tools) or ["(none)"]
+            logger.warning(
+                TOOL_NOT_FOUND,
+                name=name,


For TOOL_NOT_FOUND logging, this uses the field key name=... while the rest of the tool logging consistently uses tool_name=... (e.g., all TOOL_INVOKE_* events). Using inconsistent keys makes log filtering/analytics harder; consider renaming name to tool_name here for consistency.

Suggested change

name=name,

tool_name=name,

Copilot · 2026-03-05T16:11:24Z

src/ai_company/tools/invoker.py

+        Returns:
+            Tuple of results in the same order as the input.
+        """
+        return tuple([await self.invoke(call) for call in tool_calls])


invoke_all() builds a list via comprehension and then immediately converts it to a tuple (tuple([ ... ])), which does an extra allocation/copy. Prefer collecting results in a list and returning tuple(results), or other approaches that avoid the intermediate list-to-tuple conversion overhead.

Suggested change

return tuple([await self.invoke(call) for call in tool_calls])

results: list[ToolResult] = []

for call in tool_calls:

results.append(await self.invoke(call))

return tuple(results)

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-designed tool system, including an abstract base class for tools, a registry for managing them, and an invoker for executing them. While the implementation demonstrates strong attention to security (SSRF prevention), there are critical issues related to internal state mutation, version-dependent exception handling that could lead to a SyntaxError in some Python versions, and potential information exposure through error messages. Addressing these will ensure the system's reliability and security.

gemini-code-assist · 2026-03-05T16:13:02Z

src/ai_company/tools/invoker.py

+
+        try:
+            result = await tool.execute(arguments=dict(tool_call.arguments))
+        except MemoryError, RecursionError:


The except MemoryError, RecursionError: syntax is problematic. While it's valid in Python 3.13+ (PEP 758), in earlier Python 3 versions, it's interpreted as except MemoryError as RecursionError:. This means RecursionError is not caught as intended and the RecursionError built-in class name is shadowed. This could lead to SyntaxError in some Python 3 environments or incorrect exception handling, contradicting the design goal of propagating non-recoverable errors.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

gemini-code-assist · 2026-03-05T16:13:02Z

src/ai_company/tools/base.py

+        return ToolDefinition(
+            name=self._name,
+            description=self._description,
+            parameters_schema=self._parameters_schema or {},


The to_definition method returns a ToolDefinition object that contains a reference to the tool's internal parameters_schema dictionary. Since dictionaries are mutable in Python, any external modification to the parameters_schema in the returned ToolDefinition will directly mutate the tool's internal state. This can lead to unexpected behavior or security bypasses if the schema is used for input validation in other parts of the system (like the ToolInvoker).

Suggested change

parameters_schema=self._parameters_schema or {},

parameters_schema=copy.deepcopy(self._parameters_schema) if self._parameters_schema is not None else {},

gemini-code-assist · 2026-03-05T16:13:02Z

src/ai_company/tools/invoker.py

+        except MemoryError, RecursionError:
+            raise
+        except Exception as exc:
+            error_msg = str(exc) or f"{type(exc).__name__} (no message)"


When a tool execution fails, the raw exception message (str(exc)) is captured and returned to the LLM as part of the ToolResult. These error messages may contain sensitive information about the system's internal state, database schemas, or third-party service details, depending on the tool's implementation. An attacker could potentially use the LLM to trigger specific errors and exfiltrate this information.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 77-194: The invoke() method is too large; split it into small
private helpers: implement _get_tool(tool_name, tool_call_id) to perform
registry lookup and handle ToolNotFoundError, _validate_parameters(tool,
tool_call) to run jsonschema.validate and return ToolParameterError/ToolResult
on SchemaError/ValidationError, _execute_tool(tool, tool_call) to call await
tool.execute(...) and translate exceptions into ToolExecutionError (preserve the
MemoryError/RecursionError re-raise behavior), and
_map_execution_result(tool_call, result) to log
TOOL_INVOKE_TOOL_ERROR/TOOL_INVOKE_SUCCESS and return the final ToolResult; then
refactor invoke() to orchestrate these helpers so each helper is under ~50 lines
and invoke() becomes a thin coordinator calling _get_tool, _validate_parameters,
_execute_tool, and _map_execution_result.
- Around line 157-158: The except clause in invoke that catches MemoryError and
RecursionError re-raises them silently; update that clause in
src/ai_company/tools/invoker.py (function invoke) to log an ERROR-level message
with contextual details (including input args/state and exception info) before
re-raising, using the module's existing logger object (e.g., logger) and
exc_info=True; after adding the logging, refactor invoke into smaller helpers
(validate_params, execute_tool, handle_results, and handle_errors) to keep each
function under ~50 lines and separate validation, execution, and error-logging
responsibilities.

In `@src/ai_company/tools/registry.py`:
- Around line 64-68: The state-transition log using TOOL_REGISTRY_BUILT is
currently emitted with logger.debug; change that call to logger.info so the
event is logged at INFO level (replace the logger.debug(...) call that passes
TOOL_REGISTRY_BUILT, tool_count=len(self._tools), tools=sorted(self._tools) with
logger.info while preserving the same message and keyword args) — locate this in
the ToolRegistry/build (or the method containing the TOOL_REGISTRY_BUILT
emission) in registry.py and update the logging level only.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2aa8c784-cbed-4316-ac43-ffec8d080bef

📥 Commits

Reviewing files that changed from the base of the PR and between 92e9023 and 54e1337.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

pyproject.toml
src/ai_company/observability/events.py
src/ai_company/tools/__init__.py
src/ai_company/tools/base.py
src/ai_company/tools/errors.py
src/ai_company/tools/examples/__init__.py
src/ai_company/tools/examples/echo.py
src/ai_company/tools/invoker.py
src/ai_company/tools/registry.py
tests/unit/tools/__init__.py
tests/unit/tools/conftest.py
tests/unit/tools/test_base.py
tests/unit/tools/test_echo.py
tests/unit/tools/test_errors.py
tests/unit/tools/test_invoker.py
tests/unit/tools/test_registry.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models with BaseModel, model_validator, and ConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with: from ai_company.observability import get_logger then logger = get_logger(__name__)
Always use variable name logger (not _logger, not log) for the logging instance
Pure data models, enums, and re-exports do not require logging

Files:

tests/unit/tools/test_registry.py
tests/unit/tools/test_errors.py
src/ai_company/observability/events.py
src/ai_company/tools/errors.py
src/ai_company/tools/base.py
src/ai_company/tools/__init__.py
tests/unit/tools/test_base.py
tests/unit/tools/test_echo.py
src/ai_company/tools/registry.py
tests/unit/tools/test_invoker.py
tests/unit/tools/conftest.py
src/ai_company/tools/invoker.py
src/ai_company/tools/examples/echo.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs

Files:

tests/unit/tools/test_registry.py
tests/unit/tools/test_errors.py
tests/unit/tools/test_base.py
tests/unit/tools/test_echo.py
tests/unit/tools/test_invoker.py
tests/unit/tools/conftest.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — use the project's logger
Use event name constants from ai_company.observability.events rather than inline strings
Use structured logging format: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

src/ai_company/observability/events.py
src/ai_company/tools/errors.py
src/ai_company/tools/base.py
src/ai_company/tools/__init__.py
src/ai_company/tools/registry.py
src/ai_company/tools/invoker.py
src/ai_company/tools/examples/echo.py

pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Pin all dependency versions using == in pyproject.toml
Organize dependencies into groups: test (pytest + plugins), dev (includes test + ruff, mypy, pre-commit, commitizen, pydantic)

Files:

pyproject.toml

🧠 Learnings (10)

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings

Applied to files:

src/ai_company/observability/events.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/providers/**/*.py : Retryable errors include `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, and `ProviderInternalError` with `is_retryable=True`

Applied to files:

src/ai_company/tools/errors.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR level with context before raising

Applied to files:

src/ai_company/tools/errors.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/providers/**/*.py : Non-retryable errors raise immediately without retry

Applied to files:

src/ai_company/tools/errors.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

tests/unit/tools/conftest.py

🔇 Additional comments (14)

pyproject.toml (2)

154-156: LGTM!

The mypy override for jsonschema.* follows the established pattern for packages without complete type stubs, consistent with the existing litellm.* override.

17-17: jsonschema 4.26.0 is correctly configured.

The version is available on PyPI and properly pinned with == per coding guidelines. It's correctly placed in main dependencies as a runtime requirement.

src/ai_company/tools/examples/echo.py (1)

1-49: LGTM!

Clean reference implementation of BaseTool. The schema is well-defined with additionalProperties: False for strict validation, and the reliance on upstream validation by ToolInvoker is correctly documented in the base class contract.

src/ai_company/tools/__init__.py (1)

1-19: LGTM!

Clean public API surface with alphabetically sorted __all__. The re-exports provide convenient access to all tool system components from a single import path.

src/ai_company/tools/base.py (2)

16-42: LGTM!

Well-documented frozen model with clear note about shallow immutability of the metadata dict. The field defaults and descriptions are appropriate.

45-132: LGTM!

Solid design with defensive deep-copy on construction and property access. The to_definition() method correctly passes the internal schema reference to the frozen ToolDefinition, which is documented as read-only in its docstring. The abstract execute() contract clearly states that arguments are pre-validated by ToolInvoker.

tests/unit/tools/test_registry.py (1)

1-121: LGTM!

Comprehensive test coverage for ToolRegistry including edge cases (empty registry, non-string containment, unhashable types) and immutability verification. Tests are properly marked with @pytest.mark.unit and follow the established patterns.

tests/unit/tools/test_base.py (1)

1-145: LGTM!

Thorough test coverage for both ToolExecutionResult and BaseTool. The tests validate immutability, deep-copy semantics, name validation, and the abstract contract. The _ConcreteTool helper is a clean minimal implementation for testing.

tests/unit/tools/test_echo.py (1)

1-77: LGTM!

Excellent test coverage for EchoTool including unit tests for properties/execution and an integration test demonstrating the full pipeline (registry → invoker → result). The test uses vendor-agnostic IDs as required.

tests/unit/tools/test_errors.py (1)

1-98: LGTM!

Comprehensive test coverage for the error hierarchy including message storage, context immutability (both read-only and isolation from original dict), inheritance relationships, and string formatting. The tests validate the key design decisions around MappingProxyType for context protection.

tests/unit/tools/test_invoker.py (1)

15-229: Strong unit coverage for invoke() and invoke_all().

This test module exercises the key success/error and ordering paths with clear assertions on ToolResult shape and IDs.

src/ai_company/tools/errors.py (1)

12-56: Error hierarchy and immutable context handling look solid.

ToolError + specific subclasses are clear, and MappingProxyType protects context mutation as intended.

src/ai_company/observability/events.py (1)

111-122: Tool event constants are clean and consistent with the catalog.

Naming and grouping align with the existing observability convention and support the new tool lifecycle logs.

tests/unit/tools/conftest.py (1)

15-184: Fixture scaffolding is well-structured and reusable.

The private test tools plus sample_registry / sample_invoker fixtures provide clear, deterministic setup for tool-system unit tests.

src/ai_company/tools/invoker.py

src/ai_company/tools/registry.py

Source fixes: - Fix to_definition() schema leak: use property (deep copy) instead of raw _parameters_schema reference (Greptile, Gemini, type-analyzer) - Refactor invoke() into 4 private helpers to meet <50-line guideline (CodeRabbit) - Add logging before re-raising MemoryError/RecursionError (CodeRabbit) - Add catch-all except in _validate_params for SSRF/ref errors - Change TOOL_REGISTRY_BUILT log level from DEBUG to INFO (CodeRabbit) - Fix inconsistent log key name= to tool_name= in registry (Copilot) - Add debug log for TypeError in __contains__ (silent-failure-hunter) - Add logger to base.py, log before ValueError raise (logging-audit) - Fix execute() docstring: caveat no-schema case (comment-analyzer) - Fix registry module docstring "tuples" wording (comment-analyzer) - Remove TODO-like "future iteration" note from invoke_all (comment-analyzer) New tests (12 added, 77 total): - RecursionError re-raise propagation (2 tests) - jsonschema.SchemaError path with invalid tool schema (2 tests) - SSRF protection via blocked remote $ref resolution (2 tests) - Empty exception message fallback formatting (1 test) New event constants: - TOOL_INVOKE_NON_RECOVERABLE, TOOL_BASE_INVALID_NAME, TOOL_REGISTRY_CONTAINS_TYPE_ERROR All 1672 tests pass, mypy strict clean, ruff clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

src/ai_company/tools/invoker.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 57-59: Update the docstrings in src/ai_company/tools/invoker.py to
accurately describe the error contract: state that while most tool errors are
caught and converted to ToolResult(is_error=True), certain non-recoverable
exceptions are re-raised and not swallowed. Modify the docstrings for the public
entry points (e.g., invoke, any public ToolInvoker method) and any internal
methods that currently claim "all errors are caught" (referencing ToolResult and
the method names present in this file) so callers know which errors will be
returned as ToolResult and which will propagate.
- Around line 133-197: The _validate_params method is too large and should be
split into focused helpers: extract the jsonschema.validate call and try/except
into a new helper (e.g., _run_schema_validation(tool_call, schema) that returns
None or raises the underlying exception), and create small mapping helpers like
_map_schema_error(exc, tool_call) , _map_validation_error(exc, tool_call) and
_map_generic_error(exc, tool_call) that each log using TOOL_INVOKE_SCHEMA_ERROR
or TOOL_INVOKE_PARAMETER_ERROR and return the appropriate ToolResult (using
ToolResult and ToolParameterError) or None; then refactor _validate_params to
simply fetch schema, call _run_schema_validation with _SAFE_REGISTRY, catch
exceptions and delegate to the mapping helpers, keeping _validate_params under
50 lines.

In `@src/ai_company/tools/registry.py`:
- Around line 110-119: The __contains__ method currently only logs
TOOL_REGISTRY_CONTAINS_TYPE_ERROR on TypeError (unhashable inputs) but silently
returns False for hashable non-str inputs; update __contains__ (in class using
self._tools) to consistently emit the observability event whenever the queried
name is not a str (e.g., use isinstance(name, str) check) before membership
testing, calling logger.debug(TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
name_type=type(name).__name__) for non-str inputs, and then return False; keep
the existing TypeError handling for unhashable values.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2323a323-ac50-448c-a2d1-abb71379af44

📥 Commits

Reviewing files that changed from the base of the PR and between 54e1337 and e3edf25.

📒 Files selected for processing (6)

src/ai_company/observability/events.py
src/ai_company/tools/base.py
src/ai_company/tools/invoker.py
src/ai_company/tools/registry.py
tests/unit/tools/conftest.py
tests/unit/tools/test_invoker.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models with BaseModel, model_validator, and ConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with: from ai_company.observability import get_logger then logger = get_logger(__name__)
Always use variable name logger (not _logger, not log) for the logging instance
Pure data models, enums, and re-exports do not require logging

Files:

src/ai_company/tools/registry.py
src/ai_company/tools/invoker.py
src/ai_company/observability/events.py
tests/unit/tools/test_invoker.py
tests/unit/tools/conftest.py
src/ai_company/tools/base.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — use the project's logger
Use event name constants from ai_company.observability.events rather than inline strings
Use structured logging format: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

src/ai_company/tools/registry.py
src/ai_company/tools/invoker.py
src/ai_company/observability/events.py
src/ai_company/tools/base.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs

Files:

tests/unit/tools/test_invoker.py
tests/unit/tools/conftest.py

🧠 Learnings (17)

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO level

Applied to files:

src/ai_company/tools/registry.py

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/**/*.py : All functions and methods should have appropriate logging using `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

src/ai_company/tools/registry.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use structured logging format: `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`

Applied to files:

src/ai_company/tools/registry.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All functions and methods should have appropriate logging for debugging and traceability. Use `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

src/ai_company/tools/registry.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions

Applied to files:

src/ai_company/tools/registry.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14

Applied to files:

src/ai_company/tools/invoker.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

src/ai_company/tools/invoker.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines

Applied to files:

src/ai_company/tools/invoker.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR level with context before raising

Applied to files:

src/ai_company/tools/invoker.py

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

src/ai_company/tools/invoker.py

📚 Learning: 2026-03-05T14:30:10.714Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings

Applied to files:

src/ai_company/observability/events.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

tests/unit/tools/conftest.py

🔇 Additional comments (4)

src/ai_company/observability/events.py (1)

111-125: Tool lifecycle event constants look solid.

Clear, consistent event catalog extension for the tool subsystem.

tests/unit/tools/test_invoker.py (1)

12-336: Great invoker test coverage.

The suite exercises happy paths, error mapping, schema failures, SSRF blocking behavior, and non-recoverable propagation in a clean, structured way.

src/ai_company/tools/base.py (1)

63-118: Base abstractions are implemented cleanly.

Name validation, schema deep-copy protection, and to_definition() mapping are all in good shape for safe tool contracts.

tests/unit/tools/conftest.py (1)

15-282: Fixture scaffolding is strong and practical.

The private tool doubles and invoker/registry fixtures make edge-case testing straightforward and keep test setup isolated.

src/ai_company/tools/invoker.py

coderabbitai · 2026-03-05T16:31:56Z

src/ai_company/tools/registry.py

+    def __contains__(self, name: object) -> bool:
+        """Check whether a tool name is registered."""
+        try:
+            return name in self._tools
+        except TypeError:
+            logger.debug(
+                TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
+                name_type=type(name).__name__,
+            )
+            return False


⚠️ Potential issue | 🟡 Minor

Log non-string membership checks consistently.

__contains__ logs TOOL_REGISTRY_CONTAINS_TYPE_ERROR only when input is unhashable. Hashable non-str values (for example, 1) return False without the observability event.

Proposed fix

def __contains__(self, name: object) -> bool: """Check whether a tool name is registered.""" - try: - return name in self._tools - except TypeError: + if not isinstance(name, str): logger.debug( TOOL_REGISTRY_CONTAINS_TYPE_ERROR, name_type=type(name).__name__, ) return False + return name in self._tools

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __contains__(self, name: object) -> bool:

"""Check whether a tool name is registered."""

try:

return name in self._tools

except TypeError:

logger.debug(

TOOL_REGISTRY_CONTAINS_TYPE_ERROR,

name_type=type(name).__name__,

)

return False

def __contains__(self, name: object) -> bool:

"""Check whether a tool name is registered."""

if not isinstance(name, str):

logger.debug(

TOOL_REGISTRY_CONTAINS_TYPE_ERROR,

name_type=type(name).__name__,

)

return False

return name in self._tools

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/tools/registry.py` around lines 110 - 119, The __contains__ method currently only logs TOOL_REGISTRY_CONTAINS_TYPE_ERROR on TypeError (unhashable inputs) but silently returns False for hashable non-str inputs; update __contains__ (in class using self._tools) to consistently emit the observability event whenever the queried name is not a str (e.g., use isinstance(name, str) check) before membership testing, calling logger.debug(TOOL_REGISTRY_CONTAINS_TYPE_ERROR, name_type=type(name).__name__) for non-str inputs, and then return False; keep the existing TypeError handling for unhashable values.

…refactors) - Add MemoryError/RecursionError re-raise in _validate_params before catch-all - Use dedicated TOOL_INVOKE_VALIDATION_UNEXPECTED event for catch-all handler - Extract 3 error handler helpers to keep _validate_params under 50 lines - Update docstrings to document non-recoverable error propagation - Use isinstance check in ToolRegistry.__contains__ instead of try/except Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-05T16:49:23Z

src/ai_company/tools/invoker.py

+            return self._schema_error_result(tool_call, exc.message)
+        except jsonschema.ValidationError as exc:
+            return self._param_error_result(tool_call, exc.message)
+        except (MemoryError, RecursionError) as exc:


PEP 758 no-parentheses convention not applied

CLAUDE.md explicitly states the project convention: "PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14." Both MemoryError/RecursionError guard clauses still use the parenthesized form.

The same violation occurs at line 240.

Suggested change

except (MemoryError, RecursionError) as exc:

except MemoryError, RecursionError as exc:

Context Used: Rule from dashboard - CLAUDE.md (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/tools/invoker.py Line: 157 Comment: **PEP 758 no-parentheses convention not applied** `CLAUDE.md` explicitly states the project convention: *"PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14."* Both `MemoryError`/`RecursionError` guard clauses still use the parenthesized form. The same violation occurs at line 240. ```suggestion except MemoryError, RecursionError as exc: ``` **Context Used:** Rule from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=6816cd03-d0e1-4fd0-9d04-2417487a584c)) How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-05T16:49:25Z

src/ai_company/tools/invoker.py

+        """Execute the tool, catching errors as ``ToolResult``."""
+        try:
+            return await tool.execute(arguments=dict(tool_call.arguments))
+        except (MemoryError, RecursionError) as exc:


PEP 758 no-parentheses convention not applied (second occurrence)

Same CLAUDE.md convention applies here — except A, B: without parentheses. See also line 157.

Suggested change

except (MemoryError, RecursionError) as exc:

except MemoryError, RecursionError as exc:

Context Used: Rule from dashboard - CLAUDE.md (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/tools/invoker.py Line: 240 Comment: **PEP 758 no-parentheses convention not applied (second occurrence)** Same `CLAUDE.md` convention applies here — `except A, B:` without parentheses. See also line 157. ```suggestion except MemoryError, RecursionError as exc: ``` **Context Used:** Rule from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=6816cd03-d0e1-4fd0-9d04-2417487a584c)) How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-05T16:49:27Z

src/ai_company/tools/invoker.py

+        logger.exception(
+            TOOL_INVOKE_SCHEMA_ERROR,
+            tool_call_id=tool_call.id,
+            tool_name=tool_call.name,
+            error=error_msg,
+        )


logger.exception called outside the immediate except scope

_schema_error_result is a helper method called from within an except jsonschema.SchemaError block, and _unexpected_validation_result is called from except Exception. Python does propagate sys.exc_info() into called functions, so both calls work correctly today — the traceback is captured. However, this is a fragile pattern: if either helper is ever called from a non-except context in a future refactor (e.g., called proactively with a pre-built error message), logger.exception() will silently log NoneType: None as the exception info, making the log record misleading without any error at the call site.

A more explicit and resilient approach is to accept the live exception as a parameter:

def _schema_error_result( self, tool_call: ToolCall, error_msg: str, exc: Exception | None = None, ) -> ToolResult: logger.error( TOOL_INVOKE_SCHEMA_ERROR, ..., exc_info=exc, )

This makes the dependency on exception context explicit and prevents future accidental misuse.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/tools/invoker.py Line: 176-181 Comment: **`logger.exception` called outside the immediate `except` scope** `_schema_error_result` is a helper method called from within an `except jsonschema.SchemaError` block, and `_unexpected_validation_result` is called from `except Exception`. Python does propagate `sys.exc_info()` into called functions, so both calls work correctly today — the traceback is captured. However, this is a fragile pattern: if either helper is ever called from a non-except context in a future refactor (e.g., called proactively with a pre-built error message), `logger.exception()` will silently log `NoneType: None` as the exception info, making the log record misleading without any error at the call site. A more explicit and resilient approach is to accept the live exception as a parameter: ```python def _schema_error_result( self, tool_call: ToolCall, error_msg: str, exc: Exception | None = None, ) -> ToolResult: logger.error( TOOL_INVOKE_SCHEMA_ERROR, ..., exc_info=exc, ) ``` This makes the dependency on exception context explicit and prevents future accidental misuse. How can I resolve this? If you propose a fix, please make it concise.

Copilot

Pull request overview

Copilot reviewed 14 out of 17 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T19:31:14Z

src/ai_company/tools/invoker.py

+        Returns:
+            Tuple of results in the same order as the input.
+        """
+        return tuple([await self.invoke(call) for call in tool_calls])


This builds an intermediate list before converting to a tuple. For large tool call batches, avoid the extra allocation by accumulating results in a preallocated list + converting once, or by using a direct tuple construction approach that doesn’t materialize an intermediate list.

Copilot · 2026-03-05T19:31:15Z

src/ai_company/tools/errors.py

+    def __init__(
+        self,
+        message: str,
+        *,
+        context: dict[str, Any] | None = None,
+    ) -> None:


The context parameter is typed as dict[str, Any], but the implementation immediately copies it and stores an immutable mapping. Accepting a broader Mapping[str, Any] | None (and documenting that it’s copied) would make the API more flexible for callers (e.g., when they already have an immutable mapping) without changing runtime behavior.

Copilot · 2026-03-05T19:31:15Z

src/ai_company/tools/registry.py

+            logger.warning(
+                TOOL_NOT_FOUND,
+                tool_name=name,
+                available=available,
+            )


A tool-miss is logged here (TOOL_NOT_FOUND) and then logged again in ToolInvoker._lookup_tool (TOOL_INVOKE_NOT_FOUND). Consider consolidating responsibility for logging misses in one layer (registry or invoker) to reduce duplicate log volume and make alerting/dashboards less noisy.

Suggested change

logger.warning(

TOOL_NOT_FOUND,

tool_name=name,

available=available,

)

Copilot AI review requested due to automatic review settings March 5, 2026 16:04

Copilot started reviewing on behalf of Aureliolo March 5, 2026 16:05 View session

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

src/ai_company/tools/base.py Show resolved Hide resolved

Copilot AI reviewed Mar 5, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

src/ai_company/tools/invoker.py Show resolved Hide resolved

src/ai_company/tools/invoker.py Outdated Show resolved Hide resolved

src/ai_company/tools/registry.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

src/ai_company/tools/invoker.py Outdated Show resolved Hide resolved

src/ai_company/tools/invoker.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 5, 2026 16:44

Aureliolo merged commit c51068b into main Mar 5, 2026
1 of 2 checks passed

Aureliolo deleted the feat/tool-system branch March 5, 2026 16:45

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

Copilot AI reviewed Mar 5, 2026

View reviewed changes

-        return tuple([await self.invoke(call) for call in tool_calls])
+        results: list[ToolResult] = []
+        for call in tool_calls:
+            results.append(await self.invoke(call))
+        return tuple(results)

	except MemoryError, RecursionError:
	except (MemoryError, RecursionError):

	parameters_schema=self._parameters_schema or {},
	parameters_schema=copy.deepcopy(self._parameters_schema) if self._parameters_schema is not None else {},

	except (MemoryError, RecursionError) as exc:
	except MemoryError, RecursionError as exc:

Conversation

Aureliolo commented Mar 5, 2026

Summary

Key Design Decisions

Test Plan

Review Coverage

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

OpenSSF Scorecard

Scanned Files

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

greptile-apps bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

greptile-apps bot commented Mar 5, 2026 •

edited

Loading