Skip to content

feat: implement basic tool system (registry, invocation, results) (#15)#104

Merged
Aureliolo merged 3 commits intomainfrom
feat/tool-system
Mar 5, 2026
Merged

feat: implement basic tool system (registry, invocation, results) (#15)#104
Aureliolo merged 3 commits intomainfrom
feat/tool-system

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Implements the application-level tool system that bridges LLM ToolCall objects with actual tool execution (Issue #15, Design Spec §11.1-11.2).

  • Error hierarchy: ToolErrorToolNotFoundError, ToolParameterError, ToolExecutionError with immutable MappingProxyType context
  • BaseTool ABC: Abstract base with ToolExecutionResult frozen Pydantic model, deep-copy schema protection, to_definition() for LLM integration
  • ToolRegistry: Immutable MappingProxyType-backed registry with duplicate rejection, sorted tool listing, and to_definitions() for provider layer
  • ToolInvoker: Validates parameters via JSON Schema (with SSRF-safe $ref resolution), executes tools, never propagates exceptions — all errors become ToolResult(is_error=True)
  • EchoTool: Reference implementation demonstrating the tool contract
  • 10 observability event constants for structured logging across the tool system
  • jsonschema==4.26.0 added as explicit dependency (was only transitive via litellm)

Key Design Decisions

  • ToolInvoker uses referencing.Registry with a custom retrieve function that raises NoSuchResource to block remote $ref resolution (SSRF prevention)
  • except MemoryError, RecursionError: raise before except Exception ensures non-recoverable errors propagate (PEP 758 syntax)
  • BaseTool.parameters_schema property returns copy.deepcopy() to prevent external mutation
  • invoke_all() executes sequentially (parallel execution deferred to future iteration)

Test Plan

  • 65 unit tests covering errors, base, registry, invoker, and echo tool
  • All 1665 tests pass (pytest tests/ -n auto)
  • 95.06% overall coverage (80% minimum enforced)
  • mypy strict passes on all source and test files
  • ruff lint + format clean

Review Coverage

Pre-reviewed by 8 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, security-reviewer). 17 findings addressed:

  • SSRF prevention in JSON Schema validation
  • SchemaError vs ValidationError separation
  • MemoryError/RecursionError re-raise
  • Schema property deep copy protection
  • Log level corrections (warning vs error)
  • Soft error propagation path coverage
  • Unhashable type containment check
  • Empty error message fallback

Closes #15

🤖 Generated with Claude Code

Adds the application-level tool system that bridges LLM tool calls
with actual tool execution:

- ToolError hierarchy (ToolNotFoundError, ToolParameterError, ToolExecutionError)
- BaseTool ABC with ToolExecutionResult frozen model
- ToolRegistry with immutable MappingProxyType backing
- ToolInvoker with JSON Schema validation and SSRF-safe $ref resolution
- EchoTool reference implementation
- 65 unit tests with near-100% coverage on new modules

Pre-reviewed by 8 agents, 17 findings addressed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 16:04
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 5, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails
pip/jsonschema 4.26.0 UnknownUnknown

Scanned Files

  • pyproject.toml

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7df468d-61ac-4dc0-9b48-bbd01a009d01

📥 Commits

Reviewing files that changed from the base of the PR and between e3edf25 and 88bc06f.

📒 Files selected for processing (3)
  • src/ai_company/observability/events.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/registry.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Comprehensive tool framework: register, validate, and invoke tools with lifecycle events and an example echo tool.
  • Error Handling
    • Structured tool error model with immutable context and clear error types; consistent error/results mapping from invocations.
  • Integration
    • Tool invocation engine supporting single and sequential calls, parameter validation, and robust handling of execution outcomes.
  • Tests
    • Extensive unit test suite covering registry, invoker, tools, and error behaviors.
  • Chores
    • Added jsonschema dependency and a type-check override for it.

Walkthrough

Adds a new tool subsystem: immutable ToolExecutionResult and BaseTool, a ToolError hierarchy, an immutable ToolRegistry, a ToolInvoker with JSON Schema validation and SSRF blocking, an EchoTool example, observability event constants, jsonschema dependency, and comprehensive unit tests and fixtures.

Changes

Cohort / File(s) Summary
Project metadata & observability
pyproject.toml, src/ai_company/observability/events.py
Added jsonschema==4.26.0 and mypy ignore for jsonschema.*. Introduced tool lifecycle event constants (TOOL_REGISTRY_BUILT, TOOL_REGISTRY_DUPLICATE, TOOL_NOT_FOUND, TOOL_INVOKE_*, TOOL_BASE_INVALID_NAME, TOOL_REGISTRY_CONTAINS_TYPE_ERROR).
Core types & exports
src/ai_company/tools/__init__.py, src/ai_company/tools/base.py
Added ToolExecutionResult (frozen model) and BaseTool ABC with name/description/parameters_schema and to_definition()/abstract execute(). Re-exported core tool symbols in package __all__.
Errors
src/ai_company/tools/errors.py
New ToolError base class storing an immutable context and subclasses ToolNotFoundError, ToolParameterError, ToolExecutionError.
Registry
src/ai_company/tools/registry.py
New immutable ToolRegistry built from Iterable[BaseTool]; rejects duplicates, logs events, provides get(), list_tools(), to_definitions(), __contains__() with type checks, and __len__(). Raises ToolNotFoundError for missing tools.
Invoker
src/ai_company/tools/invoker.py
New ToolInvoker with invoke() and invoke_all(); looks up tools, validates arguments against JSON Schema with remote-$ref blocking, executes tools, maps exceptions to ToolResult (re-raises non-recoverable errors), and emits structured logs/events.
Example tool
src/ai_company/tools/examples/echo.py
Added EchoTool implementing BaseTool with a required message parameter and async execute() that echoes the message.
Tests & fixtures
tests/unit/tools/conftest.py, tests/unit/tools/test_base.py, tests/unit/tools/test_echo.py, tests/unit/tools/test_errors.py, tests/unit/tools/test_invoker.py, tests/unit/tools/test_registry.py
Added extensive fixtures and unit tests covering base types, error hierarchy, registry behaviors (duplicates, immutability, lookups), invoker flows (validation, not-found, execution errors, SSRF protection, non-recoverable errors), and EchoTool integration.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Invoker as ToolInvoker
    participant Registry as ToolRegistry
    participant Tool as BaseTool

    Client->>Invoker: invoke(tool_call)
    Invoker->>Registry: get(tool_name)
    alt tool found
        Registry-->>Invoker: BaseTool
        Invoker->>Invoker: validate(arguments, schema) (jsonschema, no-remote-$ref)
        alt validation OK
            Invoker->>Tool: execute(arguments)
            Tool-->>Invoker: ToolExecutionResult
            Invoker-->>Client: ToolResult (is_error: false)
        else validation failed
            Invoker-->>Client: ToolResult (is_error: true, validation error)
        end
    else not found
        Registry-->>Invoker: raises ToolNotFoundError
        Invoker-->>Client: ToolResult (is_error: true, not found)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.23% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: implement basic tool system (registry, invocation, results)' accurately summarizes the main change: a comprehensive tool system with registry, invoker, and result handling.
Description check ✅ Passed The pull request description provides detailed context about the tool system implementation, design decisions, test coverage, and links to Issue #15, directly related to the changeset.
Linked Issues check ✅ Passed All acceptance criteria from Issue #15 are met: tool base class (BaseTool ABC), registry with register/lookup/list (ToolRegistry), invoker with validation (ToolInvoker), structured result format (ToolExecutionResult, ToolResult), JSON Schema support, EchoTool example, 65+ unit tests, and error handling.
Out of Scope Changes check ✅ Passed All changes directly support Issue #15 requirements: error hierarchy, base tool abstraction, registry, invoker, observability events, and dependencies. Test fixtures and example implementations are in-scope for the tool system.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/tool-system

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational tool system that enables the application to effectively manage and execute tools based on LLM requests. It establishes clear abstractions for tools, robust error handling, and secure parameter validation, providing a reliable and extensible framework for integrating external functionalities.

Highlights

  • Tool System Implementation: Implemented the core application-level tool system, bridging LLM ToolCall objects with actual tool execution, as outlined in Issue Design and implement basic tool system (registry, invocation, results) #15 and Design Spec §11.1-11.2.
  • Robust Error Handling: Introduced a comprehensive error hierarchy (ToolError -> ToolNotFoundError, ToolParameterError, ToolExecutionError) with immutable MappingProxyType context for structured metadata.
  • BaseTool Abstraction: Defined BaseTool as an Abstract Base Class (ABC) with a ToolExecutionResult frozen Pydantic model, deep-copy schema protection, and a to_definition() method for LLM integration.
  • Tool Registry and Invoker: Developed ToolRegistry for immutable, duplicate-rejecting tool storage and ToolInvoker for validating parameters via JSON Schema (with SSRF-safe $ref resolution) and executing tools, ensuring errors are always returned as ToolResult(is_error=True).
  • Observability and Dependencies: Added 10 new observability event constants for structured logging within the tool system and explicitly included jsonschema==4.26.0 as a direct dependency.
  • Security and Stability: Implemented SSRF prevention in JSON Schema validation and ensured non-recoverable errors like MemoryError and RecursionError propagate, rather than being caught by the invoker's general exception handling.
Changelog
  • pyproject.toml
    • Added jsonschema==4.26.0 to the project's dependencies.
    • Configured MyPy to ignore missing imports for the jsonschema module.
  • src/ai_company/observability/events.py
    • Introduced 10 new Final[str] constants for logging events related to the tool lifecycle, including registry operations, invocation start/success, and various error types.
  • src/ai_company/tools/init.py
    • Created the __init__.py file to define the tools package.
    • Exported BaseTool, ToolExecutionResult, ToolError, ToolExecutionError, ToolNotFoundError, ToolParameterError, ToolInvoker, ToolRegistry, and EchoTool for public access.
  • src/ai_company/tools/base.py
    • Added ToolExecutionResult Pydantic model to standardize tool output, including content, error status, and optional metadata.
    • Implemented BaseTool as an abstract base class for all tools, enforcing execute method implementation.
    • Ensured tool parameters_schema is deep-copied on initialization and when accessed via property to prevent external mutation.
  • src/ai_company/tools/errors.py
    • Created a hierarchy of custom exceptions for tool-related errors: ToolError (base), ToolNotFoundError, ToolParameterError, and ToolExecutionError.
    • Designed ToolError to carry an immutable MappingProxyType context for structured error details.
  • src/ai_company/tools/examples/echo.py
    • Added EchoTool as a reference implementation of BaseTool, demonstrating how to define a tool with a fixed schema and execute method.
  • src/ai_company/tools/invoker.py
    • Implemented ToolInvoker responsible for looking up tools, validating arguments against JSON Schema, and executing tool methods.
    • Integrated a custom _no_remote_retrieve function with referencing.Registry to prevent Server-Side Request Forgery (SSRF) during JSON Schema $ref resolution.
    • Ensured ToolInvoker catches all tool-related exceptions and returns them as ToolResult(is_error=True), preventing exceptions from propagating.
    • Added specific logging events for various stages and outcomes of tool invocation.
  • src/ai_company/tools/registry.py
    • Implemented ToolRegistry as an immutable container for BaseTool instances, preventing duplicate tool names.
    • Provided methods for retrieving tools by name, listing registered tool names, and converting tools to ToolDefinition objects for LLM providers.
    • Integrated logging for registry build and duplicate tool warnings.
  • tests/unit/tools/conftest.py
    • Added a new conftest.py file to provide pytest fixtures for the tool system.
    • Defined concrete test tool implementations (_EchoTestTool, _FailingTool, _NoSchemaTool, _StrictSchemaTool, _SoftErrorTool) for comprehensive testing.
    • Provided fixtures for ToolRegistry and ToolInvoker instances pre-populated with test tools.
  • tests/unit/tools/test_base.py
    • Added unit tests for ToolExecutionResult to verify its immutability and data handling.
    • Implemented unit tests for BaseTool to confirm property behavior, name validation, schema deep-copying, and to_definition conversion.
  • tests/unit/tools/test_echo.py
    • Added unit tests for EchoTool to verify its properties, execution logic, and integration within the full tool pipeline (registry -> invoker -> tool call).
  • tests/unit/tools/test_errors.py
    • Added unit tests for the ToolError hierarchy, verifying message and context storage, string representation, and proper subclassing.
  • tests/unit/tools/test_invoker.py
    • Added extensive unit tests for ToolInvoker covering successful invocations, handling of not-found tools, parameter validation (valid, invalid, missing, extra parameters), soft errors, and execution errors.
    • Included tests for the invoke_all method to ensure correct sequential execution and error handling for multiple tool calls.
  • tests/unit/tools/test_registry.py
    • Added unit tests for ToolRegistry to verify its behavior with empty, single, and multiple tool registrations.
    • Confirmed correct handling of duplicate tool names, tool lookup, and conversion to ToolDefinition objects.
    • Tested the immutability of the registry's internal tool mapping.
  • uv.lock
    • Updated the uv.lock file to include jsonschema as a dependency, ensuring consistent environment builds.
Activity
  • The pull request implements the application-level tool system, addressing Issue Design and implement basic tool system (registry, invocation, results) #15 and following Design Spec §11.1-11.2.
  • The author performed a comprehensive test plan, including 65 unit tests covering various components of the tool system.
  • All 1665 existing tests passed, and the overall code coverage reached 95.06%, exceeding the 80% minimum.
  • MyPy strict type checking passed on all source and test files, and Ruff linting and formatting checks were clean.
  • The changes underwent pre-review by 8 different AI agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, security-reviewer).
  • 17 findings from the pre-review were addressed, including SSRF prevention, error separation, memory/recursion error re-raise, schema property deep copy protection, log level corrections, soft error propagation, unhashable type containment, and empty error message fallbacks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR implements the complete application-level tool system (Issue #15, Design Spec §11.1–11.2), bridging LLM ToolCall objects with concrete tool execution through a clean four-layer stack: ToolError hierarchy → BaseTool ABC → ToolRegistryToolInvoker. The implementation is solid and production-ready with thorough test coverage (65 tests, 95% coverage), SSRF-safe JSON Schema validation via a blocking referencing.Registry, and a well-reasoned non-recoverable error propagation contract.

  • Three previously flagged issues are fully resolved: to_definition() now uses the deep-copy parameters_schema property; _validate_params correctly re-raises MemoryError/RecursionError; and the TOOL_INVOKE_VALIDATION_UNEXPECTED constant correctly separates unexpected validation failures from SchemaError in the observability pipeline.
  • PEP 758 convention: Both except (MemoryError, RecursionError) as exc: clauses in invoker.py (lines 157, 240) use the parenthesized form. CLAUDE.md mandates except A, B: without parentheses for Python 3.14.
  • logger.exception in helper methods: _schema_error_result and _unexpected_validation_result call logger.exception() while relying on implicit exception-context propagation from their callers' except blocks. This works today, but passing the live exception explicitly (or using logger.error(..., exc_info=exc)) would make the dependency explicit and prevent silent misuse if these helpers are ever called outside an except handler.

Confidence Score: 4/5

  • Safe to merge; the two style findings are minor and do not affect correctness or runtime behavior.
  • The core logic is correct — SSRF prevention is tested and functional, non-recoverable errors propagate as documented, and the deep-copy schema protection is consistently applied. The only deductions are two instances of the parenthesized except syntax (CLAUDE.md convention violation) and a fragile implicit-exception-context pattern in the logging helper methods, neither of which causes a runtime bug.
  • src/ai_company/tools/invoker.py — lines 157, 176–181, and 240 warrant the small clean-ups described in comments.

Important Files Changed

Filename Overview
src/ai_company/tools/invoker.py Core invocation engine with SSRF-safe JSON Schema validation, non-recoverable error re-raise, and structured error results; two minor style issues (parenthesized except syntax, fragile logger.exception in helpers).
src/ai_company/tools/base.py Clean ABC with deep-copy schema protection on both the property and to_definition(); frozen ToolExecutionResult model with acknowledged shallow-copy metadata caveat.
src/ai_company/tools/registry.py Immutable MappingProxyType-backed registry with duplicate rejection, sorted listing, and safe contains type guard; no issues found.
src/ai_company/tools/errors.py Well-structured error hierarchy with immutable MappingProxyType context and clear str formatting; no issues found.
src/ai_company/observability/events.py Adds 14 new Final[str] event constants for tool lifecycle covering registry, lookup, validation, execution, and non-recoverable error paths; all namespaced consistently under tool.*.
tests/unit/tools/conftest.py Comprehensive test fixtures covering success, soft-error, hard-error, recursion, invalid schema, SSRF, and empty-message edge cases; all fixtures use vendor-agnostic names per CLAUDE.md.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM Provider
    participant TI as ToolInvoker
    participant TR as ToolRegistry
    participant BT as BaseTool
    participant JS as jsonschema (SSRF-safe)

    LLM->>TI: invoke(ToolCall)
    TI->>TR: get(tool_call.name)
    alt Tool not found
        TR-->>TI: raises ToolNotFoundError
        TI-->>LLM: ToolResult(is_error=True)
    else Tool found
        TR-->>TI: BaseTool instance
        TI->>JS: validate(arguments, schema, _SAFE_REGISTRY)
        alt SchemaError (invalid schema def)
            JS-->>TI: raises SchemaError
            TI-->>LLM: ToolResult(is_error=True)
        else ValidationError (bad arguments)
            JS-->>TI: raises ValidationError
            TI-->>LLM: ToolResult(is_error=True)
        else MemoryError / RecursionError
            JS-->>TI: raises non-recoverable
            TI-->>LLM: re-raises (propagates)
        else Unexpected exception
            JS-->>TI: raises Exception
            TI-->>LLM: ToolResult(is_error=True)
        else Validation passed
            JS-->>TI: None (success)
            TI->>BT: execute(arguments=...)
            alt Execution raises MemoryError/RecursionError
                BT-->>TI: raises non-recoverable
                TI-->>LLM: re-raises (propagates)
            else Execution raises Exception
                BT-->>TI: raises Exception
                TI-->>LLM: ToolResult(is_error=True)
            else Execution returns result
                BT-->>TI: ToolExecutionResult
                TI-->>LLM: ToolResult(content, is_error)
            end
        end
    end
Loading

Last reviewed commit: 88bc06f

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the core “tool system” layer that maps provider ToolCalls to registered application tools, with JSON Schema validation, structured results, and observability events (Issue #15 / Design Spec §11.1–11.2).

Changes:

  • Added BaseTool + ToolExecutionResult, tool error hierarchy, and an EchoTool example implementation.
  • Added ToolRegistry (immutable mapping) and ToolInvoker (schema validation + safe execution returning ToolResult on failures).
  • Added tool-related observability event constants and pinned jsonschema==4.26.0 as a direct dependency; included unit test suite for the tool subsystem.

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Adds jsonschema to the locked dependencies.
pyproject.toml Pins jsonschema==4.26.0 and adds a mypy override for jsonschema.*.
src/ai_company/tools/base.py Introduces BaseTool abstraction and ToolExecutionResult.
src/ai_company/tools/errors.py Adds ToolError hierarchy with immutable context mapping.
src/ai_company/tools/registry.py Implements immutable ToolRegistry with lookup/list/definitions.
src/ai_company/tools/invoker.py Implements tool invocation with JSON Schema validation and error-to-ToolResult conversion.
src/ai_company/tools/examples/echo.py Adds a reference EchoTool implementation.
src/ai_company/tools/init.py Exposes tool system public API surface.
src/ai_company/observability/events.py Adds tool lifecycle event constants for structured logging.
tests/unit/tools/conftest.py Adds fixtures and test-only tools for invoker/registry testing.
tests/unit/tools/test_base.py Unit tests for BaseTool and ToolExecutionResult.
tests/unit/tools/test_errors.py Unit tests for error hierarchy and context immutability.
tests/unit/tools/test_registry.py Unit tests for registry behavior and immutability.
tests/unit/tools/test_invoker.py Unit tests for invoker happy-path, validation, and failure modes.
tests/unit/tools/test_echo.py Unit + integration-style tests for EchoTool pipeline.
tests/unit/tools/init.py Adds tools test package marker.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +155 to +160
try:
result = await tool.execute(arguments=dict(tool_call.arguments))
except MemoryError, RecursionError:
raise
except Exception as exc:
error_msg = str(exc) or f"{type(exc).__name__} (no message)"
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke() catches all Exception from tool.execute(...), which will also catch asyncio.CancelledError when it subclasses Exception (common in recent Python versions). That breaks task cancellation semantics and contradicts the module docstring claim that CancelledError propagates. Add an explicit except asyncio.CancelledError: raise (and keep it before the broad except Exception) so cancellations are not converted into ToolResult errors.

Copilot uses AI. Check for mistakes.
available = sorted(self._tools) or ["(none)"]
logger.warning(
TOOL_NOT_FOUND,
name=name,
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TOOL_NOT_FOUND logging, this uses the field key name=... while the rest of the tool logging consistently uses tool_name=... (e.g., all TOOL_INVOKE_* events). Using inconsistent keys makes log filtering/analytics harder; consider renaming name to tool_name here for consistency.

Suggested change
name=name,
tool_name=name,

Copilot uses AI. Check for mistakes.
Returns:
Tuple of results in the same order as the input.
"""
return tuple([await self.invoke(call) for call in tool_calls])
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke_all() builds a list via comprehension and then immediately converts it to a tuple (tuple([ ... ])), which does an extra allocation/copy. Prefer collecting results in a list and returning tuple(results), or other approaches that avoid the intermediate list-to-tuple conversion overhead.

Suggested change
return tuple([await self.invoke(call) for call in tool_calls])
results: list[ToolResult] = []
for call in tool_calls:
results.append(await self.invoke(call))
return tuple(results)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive and well-designed tool system, including an abstract base class for tools, a registry for managing them, and an invoker for executing them. While the implementation demonstrates strong attention to security (SSRF prevention), there are critical issues related to internal state mutation, version-dependent exception handling that could lead to a SyntaxError in some Python versions, and potential information exposure through error messages. Addressing these will ensure the system's reliability and security.


try:
result = await tool.execute(arguments=dict(tool_call.arguments))
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The except MemoryError, RecursionError: syntax is problematic. While it's valid in Python 3.13+ (PEP 758), in earlier Python 3 versions, it's interpreted as except MemoryError as RecursionError:. This means RecursionError is not caught as intended and the RecursionError built-in class name is shadowed. This could lead to SyntaxError in some Python 3 environments or incorrect exception handling, contradicting the design goal of propagating non-recoverable errors.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

return ToolDefinition(
name=self._name,
description=self._description,
parameters_schema=self._parameters_schema or {},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The to_definition method returns a ToolDefinition object that contains a reference to the tool's internal parameters_schema dictionary. Since dictionaries are mutable in Python, any external modification to the parameters_schema in the returned ToolDefinition will directly mutate the tool's internal state. This can lead to unexpected behavior or security bypasses if the schema is used for input validation in other parts of the system (like the ToolInvoker).

Suggested change
parameters_schema=self._parameters_schema or {},
parameters_schema=copy.deepcopy(self._parameters_schema) if self._parameters_schema is not None else {},

except MemoryError, RecursionError:
raise
except Exception as exc:
error_msg = str(exc) or f"{type(exc).__name__} (no message)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

When a tool execution fails, the raw exception message (str(exc)) is captured and returned to the LLM as part of the ToolResult. These error messages may contain sensitive information about the system's internal state, database schemas, or third-party service details, depending on the tool's implementation. An attacker could potentially use the LLM to trigger specific errors and exfiltrate this information.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 77-194: The invoke() method is too large; split it into small
private helpers: implement _get_tool(tool_name, tool_call_id) to perform
registry lookup and handle ToolNotFoundError, _validate_parameters(tool,
tool_call) to run jsonschema.validate and return ToolParameterError/ToolResult
on SchemaError/ValidationError, _execute_tool(tool, tool_call) to call await
tool.execute(...) and translate exceptions into ToolExecutionError (preserve the
MemoryError/RecursionError re-raise behavior), and
_map_execution_result(tool_call, result) to log
TOOL_INVOKE_TOOL_ERROR/TOOL_INVOKE_SUCCESS and return the final ToolResult; then
refactor invoke() to orchestrate these helpers so each helper is under ~50 lines
and invoke() becomes a thin coordinator calling _get_tool, _validate_parameters,
_execute_tool, and _map_execution_result.
- Around line 157-158: The except clause in invoke that catches MemoryError and
RecursionError re-raises them silently; update that clause in
src/ai_company/tools/invoker.py (function invoke) to log an ERROR-level message
with contextual details (including input args/state and exception info) before
re-raising, using the module's existing logger object (e.g., logger) and
exc_info=True; after adding the logging, refactor invoke into smaller helpers
(validate_params, execute_tool, handle_results, and handle_errors) to keep each
function under ~50 lines and separate validation, execution, and error-logging
responsibilities.

In `@src/ai_company/tools/registry.py`:
- Around line 64-68: The state-transition log using TOOL_REGISTRY_BUILT is
currently emitted with logger.debug; change that call to logger.info so the
event is logged at INFO level (replace the logger.debug(...) call that passes
TOOL_REGISTRY_BUILT, tool_count=len(self._tools), tools=sorted(self._tools) with
logger.info while preserving the same message and keyword args) — locate this in
the ToolRegistry/build (or the method containing the TOOL_REGISTRY_BUILT
emission) in registry.py and update the logging level only.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2aa8c784-cbed-4316-ac43-ffec8d080bef

📥 Commits

Reviewing files that changed from the base of the PR and between 92e9023 and 54e1337.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • pyproject.toml
  • src/ai_company/observability/events.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/base.py
  • src/ai_company/tools/errors.py
  • src/ai_company/tools/examples/__init__.py
  • src/ai_company/tools/examples/echo.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/registry.py
  • tests/unit/tools/__init__.py
  • tests/unit/tools/conftest.py
  • tests/unit/tools/test_base.py
  • tests/unit/tools/test_echo.py
  • tests/unit/tools/test_errors.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/test_registry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models with BaseModel, model_validator, and ConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with: from ai_company.observability import get_logger then logger = get_logger(__name__)
Always use variable name logger (not _logger, not log) for the logging instance
Pure data models, enums, and re-exports do not require logging

Files:

  • tests/unit/tools/test_registry.py
  • tests/unit/tools/test_errors.py
  • src/ai_company/observability/events.py
  • src/ai_company/tools/errors.py
  • src/ai_company/tools/base.py
  • src/ai_company/tools/__init__.py
  • tests/unit/tools/test_base.py
  • tests/unit/tools/test_echo.py
  • src/ai_company/tools/registry.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/conftest.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/examples/echo.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs

Files:

  • tests/unit/tools/test_registry.py
  • tests/unit/tools/test_errors.py
  • tests/unit/tools/test_base.py
  • tests/unit/tools/test_echo.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/conftest.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — use the project's logger
Use event name constants from ai_company.observability.events rather than inline strings
Use structured logging format: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

  • src/ai_company/observability/events.py
  • src/ai_company/tools/errors.py
  • src/ai_company/tools/base.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/registry.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/examples/echo.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Pin all dependency versions using == in pyproject.toml
Organize dependencies into groups: test (pytest + plugins), dev (includes test + ruff, mypy, pre-commit, commitizen, pydantic)

Files:

  • pyproject.toml
🧠 Learnings (10)
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings

Applied to files:

  • src/ai_company/observability/events.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/providers/**/*.py : Retryable errors include `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, and `ProviderInternalError` with `is_retryable=True`

Applied to files:

  • src/ai_company/tools/errors.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR level with context before raising

Applied to files:

  • src/ai_company/tools/errors.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/providers/**/*.py : Non-retryable errors raise immediately without retry

Applied to files:

  • src/ai_company/tools/errors.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

  • tests/unit/tools/conftest.py
🔇 Additional comments (14)
pyproject.toml (2)

154-156: LGTM!

The mypy override for jsonschema.* follows the established pattern for packages without complete type stubs, consistent with the existing litellm.* override.


17-17: jsonschema 4.26.0 is correctly configured.

The version is available on PyPI and properly pinned with == per coding guidelines. It's correctly placed in main dependencies as a runtime requirement.

src/ai_company/tools/examples/echo.py (1)

1-49: LGTM!

Clean reference implementation of BaseTool. The schema is well-defined with additionalProperties: False for strict validation, and the reliance on upstream validation by ToolInvoker is correctly documented in the base class contract.

src/ai_company/tools/__init__.py (1)

1-19: LGTM!

Clean public API surface with alphabetically sorted __all__. The re-exports provide convenient access to all tool system components from a single import path.

src/ai_company/tools/base.py (2)

16-42: LGTM!

Well-documented frozen model with clear note about shallow immutability of the metadata dict. The field defaults and descriptions are appropriate.


45-132: LGTM!

Solid design with defensive deep-copy on construction and property access. The to_definition() method correctly passes the internal schema reference to the frozen ToolDefinition, which is documented as read-only in its docstring. The abstract execute() contract clearly states that arguments are pre-validated by ToolInvoker.

tests/unit/tools/test_registry.py (1)

1-121: LGTM!

Comprehensive test coverage for ToolRegistry including edge cases (empty registry, non-string containment, unhashable types) and immutability verification. Tests are properly marked with @pytest.mark.unit and follow the established patterns.

tests/unit/tools/test_base.py (1)

1-145: LGTM!

Thorough test coverage for both ToolExecutionResult and BaseTool. The tests validate immutability, deep-copy semantics, name validation, and the abstract contract. The _ConcreteTool helper is a clean minimal implementation for testing.

tests/unit/tools/test_echo.py (1)

1-77: LGTM!

Excellent test coverage for EchoTool including unit tests for properties/execution and an integration test demonstrating the full pipeline (registry → invoker → result). The test uses vendor-agnostic IDs as required.

tests/unit/tools/test_errors.py (1)

1-98: LGTM!

Comprehensive test coverage for the error hierarchy including message storage, context immutability (both read-only and isolation from original dict), inheritance relationships, and string formatting. The tests validate the key design decisions around MappingProxyType for context protection.

tests/unit/tools/test_invoker.py (1)

15-229: Strong unit coverage for invoke() and invoke_all().

This test module exercises the key success/error and ordering paths with clear assertions on ToolResult shape and IDs.

src/ai_company/tools/errors.py (1)

12-56: Error hierarchy and immutable context handling look solid.

ToolError + specific subclasses are clear, and MappingProxyType protects context mutation as intended.

src/ai_company/observability/events.py (1)

111-122: Tool event constants are clean and consistent with the catalog.

Naming and grouping align with the existing observability convention and support the new tool lifecycle logs.

tests/unit/tools/conftest.py (1)

15-184: Fixture scaffolding is well-structured and reusable.

The private test tools plus sample_registry / sample_invoker fixtures provide clear, deterministic setup for tool-system unit tests.

Source fixes:
- Fix to_definition() schema leak: use property (deep copy) instead of
  raw _parameters_schema reference (Greptile, Gemini, type-analyzer)
- Refactor invoke() into 4 private helpers to meet <50-line guideline
  (CodeRabbit)
- Add logging before re-raising MemoryError/RecursionError (CodeRabbit)
- Add catch-all except in _validate_params for SSRF/ref errors
- Change TOOL_REGISTRY_BUILT log level from DEBUG to INFO (CodeRabbit)
- Fix inconsistent log key name= to tool_name= in registry (Copilot)
- Add debug log for TypeError in __contains__ (silent-failure-hunter)
- Add logger to base.py, log before ValueError raise (logging-audit)
- Fix execute() docstring: caveat no-schema case (comment-analyzer)
- Fix registry module docstring "tuples" wording (comment-analyzer)
- Remove TODO-like "future iteration" note from invoke_all (comment-analyzer)

New tests (12 added, 77 total):
- RecursionError re-raise propagation (2 tests)
- jsonschema.SchemaError path with invalid tool schema (2 tests)
- SSRF protection via blocked remote $ref resolution (2 tests)
- Empty exception message fallback formatting (1 test)

New event constants:
- TOOL_INVOKE_NON_RECOVERABLE, TOOL_BASE_INVALID_NAME,
  TOOL_REGISTRY_CONTAINS_TYPE_ERROR

All 1672 tests pass, mypy strict clean, ruff clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/tools/invoker.py`:
- Around line 57-59: Update the docstrings in src/ai_company/tools/invoker.py to
accurately describe the error contract: state that while most tool errors are
caught and converted to ToolResult(is_error=True), certain non-recoverable
exceptions are re-raised and not swallowed. Modify the docstrings for the public
entry points (e.g., invoke, any public ToolInvoker method) and any internal
methods that currently claim "all errors are caught" (referencing ToolResult and
the method names present in this file) so callers know which errors will be
returned as ToolResult and which will propagate.
- Around line 133-197: The _validate_params method is too large and should be
split into focused helpers: extract the jsonschema.validate call and try/except
into a new helper (e.g., _run_schema_validation(tool_call, schema) that returns
None or raises the underlying exception), and create small mapping helpers like
_map_schema_error(exc, tool_call) , _map_validation_error(exc, tool_call) and
_map_generic_error(exc, tool_call) that each log using TOOL_INVOKE_SCHEMA_ERROR
or TOOL_INVOKE_PARAMETER_ERROR and return the appropriate ToolResult (using
ToolResult and ToolParameterError) or None; then refactor _validate_params to
simply fetch schema, call _run_schema_validation with _SAFE_REGISTRY, catch
exceptions and delegate to the mapping helpers, keeping _validate_params under
50 lines.

In `@src/ai_company/tools/registry.py`:
- Around line 110-119: The __contains__ method currently only logs
TOOL_REGISTRY_CONTAINS_TYPE_ERROR on TypeError (unhashable inputs) but silently
returns False for hashable non-str inputs; update __contains__ (in class using
self._tools) to consistently emit the observability event whenever the queried
name is not a str (e.g., use isinstance(name, str) check) before membership
testing, calling logger.debug(TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
name_type=type(name).__name__) for non-str inputs, and then return False; keep
the existing TypeError handling for unhashable values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2323a323-ac50-448c-a2d1-abb71379af44

📥 Commits

Reviewing files that changed from the base of the PR and between 54e1337 and e3edf25.

📒 Files selected for processing (6)
  • src/ai_company/observability/events.py
  • src/ai_company/tools/base.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/registry.py
  • tests/unit/tools/conftest.py
  • tests/unit/tools/test_invoker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models with BaseModel, model_validator, and ConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with: from ai_company.observability import get_logger then logger = get_logger(__name__)
Always use variable name logger (not _logger, not log) for the logging instance
Pure data models, enums, and re-exports do not require logging

Files:

  • src/ai_company/tools/registry.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/observability/events.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/conftest.py
  • src/ai_company/tools/base.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — use the project's logger
Use event name constants from ai_company.observability.events rather than inline strings
Use structured logging format: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

  • src/ai_company/tools/registry.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/observability/events.py
  • src/ai_company/tools/base.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs

Files:

  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/conftest.py
🧠 Learnings (17)
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/ai_company/tools/registry.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/**/*.py : All functions and methods should have appropriate logging using `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

  • src/ai_company/tools/registry.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use structured logging format: `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`

Applied to files:

  • src/ai_company/tools/registry.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All functions and methods should have appropriate logging for debugging and traceability. Use `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

  • src/ai_company/tools/registry.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions

Applied to files:

  • src/ai_company/tools/registry.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14

Applied to files:

  • src/ai_company/tools/invoker.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

  • src/ai_company/tools/invoker.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/ai_company/tools/invoker.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR level with context before raising

Applied to files:

  • src/ai_company/tools/invoker.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • src/ai_company/tools/invoker.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings

Applied to files:

  • src/ai_company/observability/events.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

  • tests/unit/tools/conftest.py
🔇 Additional comments (4)
src/ai_company/observability/events.py (1)

111-125: Tool lifecycle event constants look solid.

Clear, consistent event catalog extension for the tool subsystem.

tests/unit/tools/test_invoker.py (1)

12-336: Great invoker test coverage.

The suite exercises happy paths, error mapping, schema failures, SSRF blocking behavior, and non-recoverable propagation in a clean, structured way.

src/ai_company/tools/base.py (1)

63-118: Base abstractions are implemented cleanly.

Name validation, schema deep-copy protection, and to_definition() mapping are all in good shape for safe tool contracts.

tests/unit/tools/conftest.py (1)

15-282: Fixture scaffolding is strong and practical.

The private tool doubles and invoker/registry fixtures make edge-case testing straightforward and keep test setup isolated.

Comment on lines +110 to +119
def __contains__(self, name: object) -> bool:
"""Check whether a tool name is registered."""
try:
return name in self._tools
except TypeError:
logger.debug(
TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
name_type=type(name).__name__,
)
return False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log non-string membership checks consistently.

__contains__ logs TOOL_REGISTRY_CONTAINS_TYPE_ERROR only when input is unhashable. Hashable non-str values (for example, 1) return False without the observability event.

Proposed fix
     def __contains__(self, name: object) -> bool:
         """Check whether a tool name is registered."""
-        try:
-            return name in self._tools
-        except TypeError:
+        if not isinstance(name, str):
             logger.debug(
                 TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
                 name_type=type(name).__name__,
             )
             return False
+        return name in self._tools
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __contains__(self, name: object) -> bool:
"""Check whether a tool name is registered."""
try:
return name in self._tools
except TypeError:
logger.debug(
TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
name_type=type(name).__name__,
)
return False
def __contains__(self, name: object) -> bool:
"""Check whether a tool name is registered."""
if not isinstance(name, str):
logger.debug(
TOOL_REGISTRY_CONTAINS_TYPE_ERROR,
name_type=type(name).__name__,
)
return False
return name in self._tools
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/registry.py` around lines 110 - 119, The __contains__
method currently only logs TOOL_REGISTRY_CONTAINS_TYPE_ERROR on TypeError
(unhashable inputs) but silently returns False for hashable non-str inputs;
update __contains__ (in class using self._tools) to consistently emit the
observability event whenever the queried name is not a str (e.g., use
isinstance(name, str) check) before membership testing, calling
logger.debug(TOOL_REGISTRY_CONTAINS_TYPE_ERROR, name_type=type(name).__name__)
for non-str inputs, and then return False; keep the existing TypeError handling
for unhashable values.

…refactors)

- Add MemoryError/RecursionError re-raise in _validate_params before catch-all
- Use dedicated TOOL_INVOKE_VALIDATION_UNEXPECTED event for catch-all handler
- Extract 3 error handler helpers to keep _validate_params under 50 lines
- Update docstrings to document non-recoverable error propagation
- Use isinstance check in ToolRegistry.__contains__ instead of try/except

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 16:44
@Aureliolo Aureliolo merged commit c51068b into main Mar 5, 2026
1 of 2 checks passed
@Aureliolo Aureliolo deleted the feat/tool-system branch March 5, 2026 16:45
return self._schema_error_result(tool_call, exc.message)
except jsonschema.ValidationError as exc:
return self._param_error_result(tool_call, exc.message)
except (MemoryError, RecursionError) as exc:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 758 no-parentheses convention not applied

CLAUDE.md explicitly states the project convention: "PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14." Both MemoryError/RecursionError guard clauses still use the parenthesized form.

The same violation occurs at line 240.

Suggested change
except (MemoryError, RecursionError) as exc:
except MemoryError, RecursionError as exc:

Context Used: Rule from dashboard - CLAUDE.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 157

Comment:
**PEP 758 no-parentheses convention not applied**

`CLAUDE.md` explicitly states the project convention: *"PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14."* Both `MemoryError`/`RecursionError` guard clauses still use the parenthesized form.

The same violation occurs at line 240.

```suggestion
        except MemoryError, RecursionError as exc:
```

**Context Used:** Rule from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=6816cd03-d0e1-4fd0-9d04-2417487a584c))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

"""Execute the tool, catching errors as ``ToolResult``."""
try:
return await tool.execute(arguments=dict(tool_call.arguments))
except (MemoryError, RecursionError) as exc:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 758 no-parentheses convention not applied (second occurrence)

Same CLAUDE.md convention applies here — except A, B: without parentheses. See also line 157.

Suggested change
except (MemoryError, RecursionError) as exc:
except MemoryError, RecursionError as exc:

Context Used: Rule from dashboard - CLAUDE.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 240

Comment:
**PEP 758 no-parentheses convention not applied (second occurrence)**

Same `CLAUDE.md` convention applies here — `except A, B:` without parentheses. See also line 157.

```suggestion
        except MemoryError, RecursionError as exc:
```

**Context Used:** Rule from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=6816cd03-d0e1-4fd0-9d04-2417487a584c))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +176 to +181
logger.exception(
TOOL_INVOKE_SCHEMA_ERROR,
tool_call_id=tool_call.id,
tool_name=tool_call.name,
error=error_msg,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.exception called outside the immediate except scope

_schema_error_result is a helper method called from within an except jsonschema.SchemaError block, and _unexpected_validation_result is called from except Exception. Python does propagate sys.exc_info() into called functions, so both calls work correctly today — the traceback is captured. However, this is a fragile pattern: if either helper is ever called from a non-except context in a future refactor (e.g., called proactively with a pre-built error message), logger.exception() will silently log NoneType: None as the exception info, making the log record misleading without any error at the call site.

A more explicit and resilient approach is to accept the live exception as a parameter:

def _schema_error_result(
    self,
    tool_call: ToolCall,
    error_msg: str,
    exc: Exception | None = None,
) -> ToolResult:
    logger.error(
        TOOL_INVOKE_SCHEMA_ERROR,
        ...,
        exc_info=exc,
    )

This makes the dependency on exception context explicit and prevents future accidental misuse.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 176-181

Comment:
**`logger.exception` called outside the immediate `except` scope**

`_schema_error_result` is a helper method called from within an `except jsonschema.SchemaError` block, and `_unexpected_validation_result` is called from `except Exception`. Python does propagate `sys.exc_info()` into called functions, so both calls work correctly today — the traceback is captured. However, this is a fragile pattern: if either helper is ever called from a non-except context in a future refactor (e.g., called proactively with a pre-built error message), `logger.exception()` will silently log `NoneType: None` as the exception info, making the log record misleading without any error at the call site.

A more explicit and resilient approach is to accept the live exception as a parameter:

```python
def _schema_error_result(
    self,
    tool_call: ToolCall,
    error_msg: str,
    exc: Exception | None = None,
) -> ToolResult:
    logger.error(
        TOOL_INVOKE_SCHEMA_ERROR,
        ...,
        exc_info=exc,
    )
```

This makes the dependency on exception context explicit and prevents future accidental misuse.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 17 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Returns:
Tuple of results in the same order as the input.
"""
return tuple([await self.invoke(call) for call in tool_calls])
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds an intermediate list before converting to a tuple. For large tool call batches, avoid the extra allocation by accumulating results in a preallocated list + converting once, or by using a direct tuple construction approach that doesn’t materialize an intermediate list.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +25
def __init__(
self,
message: str,
*,
context: dict[str, Any] | None = None,
) -> None:
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context parameter is typed as dict[str, Any], but the implementation immediately copies it and stores an immutable mapping. Accepting a broader Mapping[str, Any] | None (and documenting that it’s copied) would make the API more flexible for callers (e.g., when they already have an immutable mapping) without changing runtime behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +90
logger.warning(
TOOL_NOT_FOUND,
tool_name=name,
available=available,
)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tool-miss is logged here (TOOL_NOT_FOUND) and then logged again in ToolInvoker._lookup_tool (TOOL_INVOKE_NOT_FOUND). Consider consolidating responsibility for logging misses in one layer (registry or invoker) to reduce duplicate log volume and make alerting/dashboards less noisy.

Suggested change
logger.warning(
TOOL_NOT_FOUND,
tool_name=name,
available=available,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design and implement basic tool system (registry, invocation, results)

2 participants