Skip to content

feat: implement tool permission checking (#16)#147

Merged
Aureliolo merged 3 commits intomainfrom
feat/tool-permission-checking
Mar 7, 2026
Merged

feat: implement tool permission checking (#16)#147
Aureliolo merged 3 commits intomainfrom
feat/tool-permission-checking

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • ToolPermissionChecker enforces access-level gating with priority-based resolution: denied list → allowed list → level categories → deny
  • BaseTool.category attribute (ToolCategory enum) enables per-tool categorization for permission checks
  • ToolInvoker integrates permission checking: filters tool definitions for LLM prompt + checks at invocation time (defense-in-depth)
  • AgentEngine creates permission-aware invokers from AgentIdentity.tools at the start of each run() call
  • 5 access levels: sandboxed → restricted → standard → elevated (hierarchical), plus custom (allow-list only)
  • New ToolPermissionDeniedError in the tool error hierarchy
  • ToolAccessLevel and ToolCategory enums added to core.enums
  • ToolPermissions.access_level field added to the agent identity card

Pre-PR Review Fixes (10 agents, 18 findings addressed)

  • Fixed integration tests missing ToolCategory on test tools
  • Added integration test for permission-denied tool call path (E2E)
  • Added filter_definitions tests (denied list, sort order)
  • Fixed invoke() docstring to include permission-check step
  • Fixed ToolAccessLevel docstring re: CUSTOM level semantics
  • Made filter_definitions sort explicit + added DEBUG log for filtered-out tools
  • Changed .get() to direct dict access for fail-loud on unmapped access levels
  • Fixed denial_reason / _check_permission docstrings
  • Extracted format_task_instruction to prompt.py (agent_engine.py under 800 lines)
  • Updated DESIGN_SPEC.md: §3.1 access_level, §11.1.1 permission model, §11.2 M3 scope note, §15.3 project structure, §15.5 conventions

Closes #16

Test plan

  • 2053 tests pass (3 new tests added)
  • 95.40% coverage (80% minimum)
  • ruff lint + format clean
  • mypy strict clean
  • All pre-commit hooks pass
  • CI pipeline (lint + type-check + test)

Review coverage

Pre-reviewed by 10 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency. 27 findings triaged, 18 implemented, 9 skipped (by design / future scope / too minor).

Aureliolo and others added 2 commits March 7, 2026 07:47
Add access-level gating for tool invocations per DESIGN_SPEC §11.2.
Permission resolution uses priority order: denied list > allowed list >
access level categories. Progressive trust is disabled for M3 (static
access levels only).

New types: ToolAccessLevel (5 levels), ToolCategory (12 categories),
ToolPermissionChecker, ToolPermissionDeniedError. Permission checking
integrates at both prompt filtering (LLM sees only permitted tools)
and runtime enforcement (belt-and-suspenders deny on invoke).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 10 agents, 18 findings addressed:

- Fix integration tests missing ToolCategory on test tools
- Add permission-denied integration test (E2E denial path)
- Add filter_definitions tests (denied list, sort order)
- Fix invoke() docstring to include permission-check step
- Fix ToolAccessLevel docstring re: CUSTOM level semantics
- Make filter_definitions sort explicit + add DEBUG log
- Change .get() to direct dict access for fail-loud on unmapped levels
- Fix denial_reason/check_permission docstrings
- Extract format_task_instruction to prompt.py (agent_engine < 800 lines)
- Update DESIGN_SPEC.md: §3.1 access_level, §11.1.1 permission model,
  §11.2 M3 scope note, §15.3 project structure, §15.5 conventions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 07:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 7, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e158bbf4-cb74-41a0-a875-c31e5d8373d0

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1abdc and 63f8b23.

📒 Files selected for processing (10)
  • DESIGN_SPEC.md
  • src/ai_company/core/enums.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/base.py
  • src/ai_company/tools/permissions.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/tools/conftest.py
  • tests/unit/tools/test_base.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/test_permissions.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Permission-based tool access with hierarchical access levels (Sandboxed, Restricted, Standard, Elevated, Custom).
    • Tools now have category classifications (e.g., Code Execution, File System, Web).
    • Runtime enforcement filters tools shown to the assistant and blocks unauthorized invocations with clear denial reasons.
  • Documentation

    • Updated design notes describing category-level gating, permission behavior, and planned finer-grained sandboxing controls.

Walkthrough

Adds category- and access-level-based tool permissioning: new enums (ToolAccessLevel, ToolCategory), a ToolPermissionChecker, permission-aware ToolInvoker flow (pre-validation checks and filtering), BaseTool category support, observability events, and tests/integration wiring to enforce and surface permission denials.

Changes

Cohort / File(s) Summary
Design & Public API
DESIGN_SPEC.md, src/ai_company/core/enums.py, src/ai_company/core/__init__.py
Adds ToolAccessLevel and ToolCategory enums; documents M3 category-level gating and new public permission/sandboxing types.
Agent Permissions Model
src/ai_company/core/agent.py
Adds ToolPermissions.access_level: ToolAccessLevel to model agent permission level.
Permission Checker
src/ai_company/tools/permissions.py
New ToolPermissionChecker implementing priority-based resolution (denied > allowed > level/category > deny), filter_definitions(), denial_reason(), and from_permissions().
Tool API & Errors
src/ai_company/tools/base.py, src/ai_company/tools/errors.py, src/ai_company/tools/__init__.py
BaseTool now accepts/exposes a category; adds ToolPermissionDeniedError and exports permission types from tools package.
Invoker Integration
src/ai_company/tools/invoker.py
ToolInvoker gains optional permission_checker, get_permitted_definitions(), _check_permission(), and pre-validation permission check returning a denial ToolResult.
Engine Integration & Prompts
src/ai_company/engine/agent_engine.py, src/ai_company/engine/prompt.py, src/ai_company/engine/react_loop.py
AgentEngine constructs a permission-aware ToolInvoker (_make_tool_invoker) and threads it into context/execute paths; replaces internal formatting helper with format_task_instruction; react loop now uses get_permitted_definitions().
Observability
src/ai_company/observability/events/tool.py
Adds events: TOOL_PERMISSION_DENIED, TOOL_PERMISSION_CHECKER_CREATED, TOOL_PERMISSION_FILTERED.
Examples & Tests
src/ai_company/tools/examples/echo.py, tests/... (many files)
EchoTool updated to set category; tests and fixtures updated/added extensively for permission scenarios and to exercise filtering and denial behavior (unit + integration).

Sequence Diagram(s)

sequenceDiagram
    participant Agent as AgentEngine
    participant Invoker as ToolInvoker
    participant Checker as ToolPermissionChecker
    participant Registry as ToolRegistry
    participant Tool as BaseTool

    Agent->>Invoker: new(registry, permission_checker)
    Agent->>Invoker: get_permitted_definitions()
    Invoker->>Checker: filter_definitions(registry)
    Checker-->>Invoker: permitted ToolDefinitions

    Agent->>Invoker: invoke(tool_call)
    Invoker->>Checker: _check_permission(tool_name, category)
    alt denied
        Checker-->>Invoker: raises/returns denial
        Invoker-->>Agent: ToolResult(error=ToolPermissionDeniedError)
    else allowed
        Invoker->>Tool: validate params / execute
        Tool-->>Invoker: ToolResult(success)
        Invoker-->>Agent: ToolResult
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 49.48% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: implement tool permission checking (#16)' clearly and concisely summarizes the main change—implementing a tool permission system—and directly corresponds to the primary objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the ToolPermissionChecker implementation, access levels, integration points, and pre-PR fixes—all substantive changes present in the PR.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements of issue #16: implements ToolPermissionChecker with priority-based resolution, adds ToolAccessLevel and ToolCategory enums, integrates permission checks at tool invocation, enforces access levels and allow/deny lists, provides clear denial messages, ensures fast config-based checks, and includes comprehensive unit and integration tests.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the stated objectives of issue #16. The extraction of format_task_instruction to prompt.py, DESIGN_SPEC.md updates, and test additions are all in service of the permission-checking feature and code organization goals mentioned in the PR description.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/tool-permission-checking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a robust tool permission checking system, enhancing the security and control over which tools AI agents can access and execute. By introducing explicit access levels, tool categorization, and a dedicated permission checker, the system ensures that agents only interact with approved functionalities, both at the prompt generation stage and during actual tool invocation. This provides a layered defense mechanism, preventing unauthorized tool usage and improving the overall reliability and safety of agent operations.

Highlights

  • Tool Permission Checking: Introduced a ToolPermissionChecker to enforce access-level gating for tools. This checker uses a priority-based system: explicitly denied tools are blocked first, then explicitly allowed tools are permitted, followed by category-based access levels, and finally, denial by default. Name matching for allowed/denied lists is case-insensitive.
  • ToolInvoker Integration: The ToolInvoker now integrates the ToolPermissionChecker. It filters tool definitions presented to the LLM based on permissions and performs a defense-in-depth check at invocation time, returning a ToolPermissionDeniedError if an unpermitted tool is called.
  • Agent Identity and Tool Permissions: The AgentIdentity model now includes a ToolPermissions.access_level field, allowing agents to be configured with one of five hierarchical access levels (sandboxed, restricted, standard, elevated) or a custom level, which dictates the categories of tools they can use.
  • Tool Categorization: A new BaseTool.category attribute (using the ToolCategory enum) has been added to enable per-tool categorization, which is fundamental for the access-level gating mechanism.
  • Refactoring and Error Handling: The _format_task_instruction helper function was extracted from agent_engine.py to prompt.py for better modularity. A new ToolPermissionDeniedError was added to the tool error hierarchy for specific permission-related failures.
Changelog
  • DESIGN_SPEC.md
    • Updated the tools configuration to include an access_level field.
    • Added a detailed section on 'Permission checking (M3)' in 11.1.1, outlining the new ToolPermissionChecker and its integration.
    • Included an 'M3 implementation note' in 11.2 clarifying the current category-level gating.
    • Updated the project structure to reflect the new permissions.py file and ToolCategory in base.py.
    • Added 'Tool permission checking' to the conventions table.
  • src/ai_company/core/init.py
    • Imported ToolAccessLevel and ToolCategory enums.
  • src/ai_company/core/agent.py
    • Imported ToolAccessLevel.
    • Added access_level field to the ToolPermissions model with a default of ToolAccessLevel.STANDARD.
  • src/ai_company/core/enums.py
    • Defined new ToolAccessLevel and ToolCategory StrEnum classes.
  • src/ai_company/engine/agent_engine.py
    • Imported format_task_instruction and ToolPermissionChecker.
    • Modified run and _execute methods to accept and pass tool_invoker.
    • Updated _prepare_context to use the tool_invoker for retrieving permitted tool definitions.
    • Refactored _format_task_instruction by moving it to prompt.py and updating its call site.
    • Modified _make_tool_invoker to instantiate a ToolPermissionChecker based on agent identity and pass it to the ToolInvoker.
  • src/ai_company/engine/prompt.py
    • Moved the _format_task_instruction function from agent_engine.py and renamed it to format_task_instruction.
  • src/ai_company/engine/react_loop.py
    • Updated _get_tool_definitions to retrieve permitted tool definitions via tool_invoker.get_permitted_definitions().
  • src/ai_company/observability/events/tool.py
    • Added new event constants: TOOL_PERMISSION_DENIED, TOOL_PERMISSION_CHECKER_CREATED, and TOOL_PERMISSION_FILTERED.
  • src/ai_company/tools/init.py
    • Updated the module docstring to reflect the inclusion of permissions.
    • Imported and exposed ToolPermissionDeniedError and ToolPermissionChecker.
  • src/ai_company/tools/base.py
    • Imported ToolCategory.
    • Added a category attribute to the BaseTool class, defaulting to ToolCategory.OTHER.
  • src/ai_company/tools/errors.py
    • Added the ToolPermissionDeniedError class to the tool error hierarchy.
  • src/ai_company/tools/examples/echo.py
    • Added category=ToolCategory.OTHER to the EchoTool initialization.
  • src/ai_company/tools/invoker.py
    • Imported TOOL_PERMISSION_DENIED and ToolPermissionChecker.
    • Added an optional permission_checker parameter to the constructor.
    • Implemented get_permitted_definitions to filter tools based on permissions.
    • Added _check_permission to perform permission validation before tool execution.
    • Integrated permission checking into the invoke method's workflow.
  • src/ai_company/tools/permissions.py
    • Added new file permissions.py containing the ToolPermissionChecker class.
    • Implemented logic for ToolPermissionChecker including access level categories, explicit allow/deny lists, and priority-based resolution.
    • Provided methods for checking permissions (is_permitted, check), generating denial reasons (denial_reason), and filtering tool definitions (filter_definitions).
  • tests/integration/engine/test_agent_engine_integration.py
    • Imported ToolPermissions, ToolAccessLevel, and ToolCategory.
    • Updated UppercaseTool initialization to include a category.
    • Added a new integration test case TestPermissionDeniedToolCall to verify that a sandboxed agent is correctly denied access to a deployment tool.
  • tests/unit/engine/test_agent_engine.py
    • Updated EchoTool initialization to include a category.
  • tests/unit/engine/test_run_result.py
    • Updated the module docstring.
    • Removed the import of _format_task_instruction from agent_engine.py.
    • Imported format_task_instruction from prompt.py.
    • Updated all calls to _format_task_instruction to use format_task_instruction.
  • tests/unit/tools/conftest.py
    • Imported ToolAccessLevel, ToolCategory, and ToolPermissionChecker.
    • Added _CategorizedTool for use in permission-related tests.
    • Introduced new fixtures: permission_registry, permission_checker, and permission_invoker for testing tool permissions.
  • tests/unit/tools/test_invoker.py
    • Imported ToolRegistry.
    • Added TestInvokerPermissionCheck to test permission checking within the ToolInvoker.
    • Added TestGetPermittedDefinitions to verify the filtering of tool definitions based on permissions.
  • tests/unit/tools/test_permissions.py
    • Added new file test_permissions.py with comprehensive unit tests for the ToolPermissionChecker class.
    • Included tests for various access levels, explicit allow/deny lists, resolution priority, denial reason generation, and definition filtering.
Activity
  • The pull request includes ToolPermissionChecker to enforce access-level gating with priority-based resolution: denied list → allowed list → level categories → deny.
  • The BaseTool.category attribute (ToolCategory enum) was added to enable per-tool categorization for permission checks.
  • The ToolInvoker now integrates permission checking, filtering tool definitions for LLM prompts and checking at invocation time for defense-in-depth.
  • The AgentEngine creates permission-aware invokers from AgentIdentity.tools at the start of each run() call.
  • Five access levels were introduced: sandboxed → restricted → standard → elevated (hierarchical), plus custom (allow-list only).
  • A new ToolPermissionDeniedError was added to the tool error hierarchy.
  • ToolAccessLevel and ToolCategory enums were added to core.enums.
  • The ToolPermissions.access_level field was added to the agent identity card.
  • Pre-PR review fixes addressed 18 findings from 10 agents, including integration tests for ToolCategory and permission-denied tool calls, filter_definitions tests, docstring fixes, and refactoring format_task_instruction.
  • The test plan confirms 2053 passing tests (3 new), 95.40% coverage, ruff lint/format clean, mypy strict clean, and all pre-commit hooks passing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive tool permission checking system, defining ToolAccessLevel and ToolCategory enums and integrating them into the AgentIdentity and ToolPermissions models. A new ToolPermissionChecker class is implemented to enforce these permissions, using a priority-based system of explicit deny/allow lists and hierarchical access levels (sandboxed, restricted, standard, elevated, custom) based on tool categories. The ToolInvoker now utilizes this checker to filter tool definitions presented to the LLM and to validate tool calls at invocation time, ensuring agents only use permitted tools. The AgentEngine has been updated to incorporate this new permission logic, including moving the format_task_instruction helper to prompt.py. Review comments identify a critical bug in agent_engine.py where RecursionError is not properly caught due to outdated Python 2 exception syntax, a prompt injection vulnerability in format_task_instruction due to direct embedding of untrusted task content, and suggest improving maintainability in ToolPermissionChecker by defining hierarchical permission sets incrementally.

timeout_seconds=timeout_seconds,
tool_invoker=tool_invoker,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The code uses the outdated Python 2 syntax for catching multiple exceptions: except MemoryError, RecursionError:. In Python 3, this is interpreted as except MemoryError as RecursionError:, which means it only catches MemoryError and assigns the exception object to the variable name RecursionError. Consequently, RecursionError is NOT caught by this block. This violates the design principle that RecursionError is a non-recoverable error that should be re-raised. Instead, it will be caught by subsequent except Exception blocks and handled as a recoverable error, which could lead to inconsistent system state.

        except (MemoryError, RecursionError):

Returns:
Markdown-formatted task instruction string.
"""
parts = [f"# Task: {task.title}", "", task.description]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Untrusted task content (task.title and task.description) is directly included in the LLM prompt without proper sanitization or delimiting. An attacker who can control the task metadata could inject malicious instructions to manipulate the agent's behavior (Prompt Injection). It is recommended to use clear delimiters (e.g., XML tags) and instructions to the LLM to treat the content as data.

Comment on lines +51 to +79
_LEVEL_CATEGORIES: ClassVar[dict[ToolAccessLevel, frozenset[ToolCategory]]] = {
ToolAccessLevel.SANDBOXED: frozenset(
{
ToolCategory.FILE_SYSTEM,
ToolCategory.CODE_EXECUTION,
ToolCategory.VERSION_CONTROL,
}
),
ToolAccessLevel.RESTRICTED: frozenset(
{
ToolCategory.FILE_SYSTEM,
ToolCategory.CODE_EXECUTION,
ToolCategory.VERSION_CONTROL,
ToolCategory.WEB,
}
),
ToolAccessLevel.STANDARD: frozenset(
{
ToolCategory.FILE_SYSTEM,
ToolCategory.CODE_EXECUTION,
ToolCategory.VERSION_CONTROL,
ToolCategory.WEB,
ToolCategory.TERMINAL,
ToolCategory.ANALYTICS,
}
),
ToolAccessLevel.ELEVATED: frozenset(ToolCategory),
ToolAccessLevel.CUSTOM: frozenset(),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability, you could define the hierarchical permission sets incrementally. This avoids repeating the categories for each access level and reduces the chance of inconsistencies if categories are added or changed in the future.

    _SANDBOXED_CATS = frozenset(
        {
            ToolCategory.FILE_SYSTEM,
            ToolCategory.CODE_EXECUTION,
            ToolCategory.VERSION_CONTROL,
        }
    )
    _RESTRICTED_CATS = _SANDBOXED_CATS | {ToolCategory.WEB}
    _STANDARD_CATS = _RESTRICTED_CATS | {
        ToolCategory.TERMINAL,
        ToolCategory.ANALYTICS,
    }

    _LEVEL_CATEGORIES: ClassVar[dict[ToolAccessLevel, frozenset[ToolCategory]]] = {
        ToolAccessLevel.SANDBOXED: _SANDBOXED_CATS,
        ToolAccessLevel.RESTRICTED: _RESTRICTED_CATS,
        ToolAccessLevel.STANDARD: _STANDARD_CATS,
        ToolAccessLevel.ELEVATED: frozenset(ToolCategory),
        ToolAccessLevel.CUSTOM: frozenset(),
    }

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ai_company/tools/base.py (1)

70-77: ⚠️ Potential issue | 🟠 Major

Make tool categorization explicit instead of defaulting to OTHER.

The category parameter defaults to ToolCategory.OTHER, and at least 13 existing BaseTool subclasses in the codebase do not pass category= to super().__init__(). These tools will silently fall back to OTHER, creating a fail-open permission model where every uncategorized legacy tool is treated as intentionally low-risk. If access-level checks permit OTHER, this becomes a security boundary violation. Require an explicit category for every tool before shipping.

Suggested change
     def __init__(
         self,
         *,
         name: str,
         description: str = "",
         parameters_schema: dict[str, Any] | None = None,
-        category: ToolCategory = ToolCategory.OTHER,
+        category: ToolCategory,
     ) -> None:

Also applies to: 93-96, 107-110

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/base.py` around lines 70 - 77, The BaseTool constructor
currently defaults category to ToolCategory.OTHER which causes silent, unsafe
fallbacks; modify BaseTool.__init__ to require category (remove the default),
and add a defensive runtime check (raise ValueError) if category is missing or
None so instantiations fail loudly; update all BaseTool subclasses and any calls
that relied on the default to pass an explicit category argument to
super().__init__; also apply the same change to the other BaseTool constructor
overloads/variants (the other __init__ signatures around the class) so all entry
points enforce explicit ToolCategory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/ai_company/tools/base.py`:
- Around line 70-77: The BaseTool constructor currently defaults category to
ToolCategory.OTHER which causes silent, unsafe fallbacks; modify
BaseTool.__init__ to require category (remove the default), and add a defensive
runtime check (raise ValueError) if category is missing or None so
instantiations fail loudly; update all BaseTool subclasses and any calls that
relied on the default to pass an explicit category argument to super().__init__;
also apply the same change to the other BaseTool constructor overloads/variants
(the other __init__ signatures around the class) so all entry points enforce
explicit ToolCategory.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d354dca-9812-45d9-a126-3e5c49c54331

📥 Commits

Reviewing files that changed from the base of the PR and between 57e487b and 2e1abdc.

📒 Files selected for processing (20)
  • DESIGN_SPEC.md
  • src/ai_company/core/__init__.py
  • src/ai_company/core/agent.py
  • src/ai_company/core/enums.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/base.py
  • src/ai_company/tools/errors.py
  • src/ai_company/tools/examples/echo.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/tools/permissions.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_run_result.py
  • tests/unit/tools/conftest.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/test_permissions.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations—Python 3.14 has PEP 649
Use except A, B: syntax (no parentheses) for exception handling on Python 3.14—ruff enforces this
Add type hints to all public functions in Python; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions—ruff D rules enforce this
Create new objects instead of mutating existing ones; use copy.deepcopy() at construction for non-Pydantic internal collections and MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, and ConfigDict
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Enforce line length of 88 characters (ruff enforces this)
Functions should be less than 50 lines, files less than 800 lines
Handle errors explicitly; never silently swallow errors in Python code
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/ai_company/tools/errors.py
  • src/ai_company/core/__init__.py
  • src/ai_company/tools/permissions.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/core/agent.py
  • tests/unit/tools/test_permissions.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/base.py
  • tests/unit/engine/test_run_result.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/examples/echo.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/tools/conftest.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/core/enums.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic MUST import from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code
Always use logger as the variable name for loggers (not _logger, not log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with logger.info(EVENT, key=value) format—never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging should be used for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging

Files:

  • src/ai_company/tools/errors.py
  • src/ai_company/core/__init__.py
  • src/ai_company/tools/permissions.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/core/agent.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/base.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/examples/echo.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/core/enums.py
{src/**/*.py,tests/**/*.py,src/**/*.yaml,src/**/*.yml,tests/**/*.yaml,tests/**/*.yml,examples/**/*.yaml,examples/**/*.yml}

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ skill/agent files, (3) third-party import paths/module names (e.g. litellm.types.llms.openai). Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/tools/errors.py
  • src/ai_company/core/__init__.py
  • src/ai_company/tools/permissions.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/core/agent.py
  • tests/unit/tools/test_permissions.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/base.py
  • tests/unit/engine/test_run_result.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/tools/examples/echo.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/tools/conftest.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/core/enums.py
src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/react_loop.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark unit tests with @pytest.mark.unit, integration tests with @pytest.mark.integration, e2e tests with @pytest.mark.e2e, and slow tests with @pytest.mark.slow
Use asyncio_mode = 'auto' for pytest async tests—no manual @pytest.mark.asyncio needed
Set a 30-second timeout per test

Files:

  • tests/unit/tools/test_permissions.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_run_result.py
  • tests/unit/tools/test_invoker.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/tools/conftest.py
🧠 Learnings (3)
📚 Learning: 2026-03-06T21:51:55.175Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T21:51:55.175Z
Learning: Applies to **/*.py : Use Pydantic v2 with `BaseModel`, `model_validator`, `computed_field`, and `ConfigDict`

Applied to files:

  • src/ai_company/tools/base.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_run_result.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use BaseAgent methods for Ollama integration: `self.generate(prompt)` for LLM calls with built-in retry logic, `self.settings` for model configuration, and `self.client` for Ollama client instance

Applied to files:

  • tests/unit/engine/test_run_result.py
🔇 Additional comments (45)
src/ai_company/tools/errors.py (1)

59-60: Nice addition of a dedicated permission error.

A distinct ToolPermissionDeniedError makes the denied path much easier to handle explicitly in the invoker, engine, and tests.

src/ai_company/engine/prompt.py (1)

641-665: Good extraction.

Pulling task-message rendering into a dedicated public helper keeps the prompt-building path smaller and makes this formatting logic easier to test in isolation.

src/ai_company/observability/events/tool.py (1)

22-24: Good observability hooks.

These events cover the important permission checkpoints and should make denied/filtering behavior much easier to trace in production logs.

tests/unit/engine/test_run_result.py (1)

19-20: Good coverage for the extracted formatter.

The updated tests exercise the main output combinations around criteria, budget, and deadline formatting, which makes the API move low-risk.

Also applies to: 201-205, 213-214, 221-221, 241-242, 262-262, 285-285

src/ai_company/tools/examples/echo.py (1)

9-9: Nice tightening of the example tool.

Adding an explicit category and making the schema strict keeps the example aligned with the new permission model and avoids silently accepting extra input.

Also applies to: 29-35

src/ai_company/core/agent.py (1)

141-152: Clean config-model extension.

Adding access_level directly to ToolPermissions fits the new permission model well and keeps the static tool policy on the frozen identity card where it belongs.

tests/unit/tools/conftest.py (1)

407-463: Useful permission-test scaffolding.

The categorized fixture registry makes the access-level tests much easier to read and should keep follow-up permission cases cheap to add.

src/ai_company/engine/react_loop.py (1)

409-416: LGTM!

The change correctly integrates permission-aware tool definition retrieval. The get_permitted_definitions() method handles both permission-filtered and unfiltered cases appropriately, and the docstring accurately reflects the new behavior.

tests/unit/engine/test_agent_engine.py (1)

296-316: LGTM!

The test correctly adds the required category parameter to align with the updated BaseTool signature. Using ToolCategory.CODE_EXECUTION is appropriate for this test tool, and the lazy import pattern keeps the test self-contained.

src/ai_company/core/__init__.py (1)

36-37: LGTM!

The new enums ToolAccessLevel and ToolCategory are correctly imported and exported, maintaining alphabetical ordering in __all__.

Also applies to: 98-99

tests/unit/tools/test_invoker.py (2)

671-722: LGTM!

The TestInvokerPermissionCheck class provides comprehensive coverage for permission checking behavior:

  • Verifies no-checker allows all tools
  • Validates denied tools return proper error results with "Permission denied" message
  • Confirms permitted tools execute successfully
  • Tests mixed permission scenarios in invoke_all

724-752: LGTM!

The TestGetPermittedDefinitions class correctly tests the definition filtering behavior. The inline comments documenting which categories STANDARD allows/denies improve test readability.

tests/integration/engine/test_agent_engine_integration.py (2)

181-182: LGTM!

The existing integration tests are correctly updated to include the category parameter, aligning with the updated BaseTool signature.

Also applies to: 262-263


308-379: LGTM!

This integration test effectively validates the permission denial flow:

  • Creates a SANDBOXED agent attempting to use a DEPLOYMENT category tool
  • Verifies the engine completes successfully (is_success=True) while the tool result contains the permission denial error
  • Confirms defense-in-depth: the tool exists in the registry but is blocked at invocation time

The test correctly demonstrates that permission denials are recoverable errors returned to the LLM, not fatal execution failures.

src/ai_company/tools/__init__.py (1)

1-27: LGTM!

The module correctly exports the new permission-related types (ToolPermissionChecker, ToolPermissionDeniedError) and updates the docstring to reflect the expanded scope. The __all__ list maintains alphabetical ordering.

src/ai_company/core/enums.py (2)

193-207: LGTM!

ToolAccessLevel is well-designed with clear hierarchical semantics documented. The five levels (SANDBOXED through ELEVATED) form a hierarchy, while CUSTOM correctly bypasses hierarchy for explicit allow/deny list control.


210-224: LGTM!

ToolCategory provides comprehensive coverage of tool types. The 12 categories align with common agent tooling patterns, and OTHER provides a sensible default for uncategorized tools.

src/ai_company/tools/invoker.py (4)

83-97: LGTM!

The constructor correctly accepts an optional permission_checker parameter as keyword-only, providing clean API design. The docstring accurately describes the behavior when None is passed.


104-114: LGTM!

get_permitted_definitions cleanly delegates to the permission checker's filter_definitions method when available, falling back to returning all definitions. This supports both the prompt-time filtering use case (React loop) and maintains backward compatibility.


116-141: LGTM!

The _check_permission method implements defense-in-depth correctly:

  • Returns None when permitted (fast path)
  • Logs at WARNING level with structured event data on denial
  • Returns a ToolResult with is_error=True and a human-readable denial reason

The logging includes all relevant context (tool_call_id, tool_name, reason) for observability.


143-184: LGTM!

The updated invoke method correctly integrates permission checking:

  • Docstring updated to reflect the 5-step flow (lookup → permissions → validation → execute → return)
  • Permission check (Lines 172-174) is positioned after tool lookup (needed to access tool.category) but before parameter validation
  • Early return on permission denial prevents unnecessary validation/execution

This implements the runtime enforcement half of the defense-in-depth strategy.

tests/unit/tools/test_permissions.py (7)

1-14: LGTM! Well-structured test module with proper markers.

The file correctly sets the 30-second timeout per test and uses @pytest.mark.unit on all test classes as required by coding guidelines.


20-41: LGTM! Clean minimal test tool implementation.

The _SimpleTool helper is appropriately scoped for testing and correctly implements the BaseTool interface with configurable name and category.


46-150: LGTM! Comprehensive access level coverage.

The test classes provide thorough coverage of each access level's category permissions, testing both allowed and denied categories. The test structure clearly documents the expected behavior of each level.


152-231: LGTM! Priority resolution tests are well-designed.

Tests correctly verify the priority chain (denied → allowed → level categories → deny) with clear documentation. The "belt-and-suspenders" test at line 166-173 explicitly validates that denied always wins, even when a tool appears in both lists.


236-264: LGTM! Denial reason tests verify user-facing messages.

Good coverage of the three denial scenarios with appropriate substring checks to ensure messages contain relevant context without being brittle to exact wording.


269-353: LGTM! Filter definitions tests are thorough.

Good coverage including empty registry edge case, allowed/denied list interaction, and the important sort-order verification at lines 339-352.


358-414: LGTM! Edge case coverage is excellent.

The whitespace normalization test (lines 409-413) correctly verifies that names are normalized at storage time, ensuring consistent matching regardless of leading/trailing whitespace in the configuration.

DESIGN_SPEC.md (3)

1645-1651: LGTM! Permission checking documentation is accurate and clear.

The documentation correctly describes the dual-integration approach (prompt-time filtering + runtime enforcement) and the priority-based permission resolution. The defense-in-depth rationale is well explained.


1750-1751: LGTM! Clear scope delineation for M3.

The note explicitly clarifies that M3 implements category-level gating only, with granular sub-constraints (workspace scope, network mode) planned for when sandboxing is implemented. This helps set appropriate expectations.


2429-2429: LGTM! Engineering conventions updated.

The tool permission checking convention is well documented with the priority-based resolution rules and integration points clearly specified.

src/ai_company/tools/permissions.py (8)

1-29: LGTM! Module setup follows all conventions.

Proper use of get_logger(__name__), TYPE_CHECKING for lazy imports, and clear module docstring documenting the priority-based resolution rules.


51-79: LGTM! Level-to-category mapping is well-structured.

The frozenset(ToolCategory) idiom for ELEVATED correctly includes all enum members. CUSTOM's empty set enforces the allowed-list-only behavior as designed.


81-103: LGTM! Constructor correctly normalizes and stores configuration.

The strip().casefold() normalization at storage time (lines 96-97) ensures case-insensitive matching works correctly. Defaulting to ToolAccessLevel.STANDARD aligns with ToolPermissions.access_level default.


105-119: LGTM! Factory method correctly converts ToolPermissions.

The conversion from tuple to frozenset for allowed/denied is necessary since ToolPermissions uses tuples (per context snippet 1). The Self return type ensures proper type inference for subclasses.


121-139: LGTM! Permission check implements priority correctly.

The method follows the documented resolution order: denied → allowed → CUSTOM denial → level categories. The normalization at lookup time (line 131) matches the storage normalization for consistent matching.


141-162: LGTM! Check method follows error handling guidelines.

Correctly logs at WARNING level with structured context before raising ToolPermissionDeniedError. The context dict provides useful debugging information.


164-188: LGTM! Denial reasons are clear and informative.

The three denial cases produce distinct, actionable messages. The docstring appropriately warns that calling this on a permitted tool returns a meaningless result.


190-215: LGTM! Filter definitions correctly uses registry and tool categories.

The method properly retrieves each tool's category from the BaseTool.category property (per context snippet 2) and filters based on permission. The explicit sort at line 205 ensures deterministic output regardless of iteration order changes.

src/ai_company/engine/agent_engine.py (6)

23-27: LGTM! Import updates are correct.

The format_task_instruction import from prompt.py replaces the previously internal helper, and ToolPermissionChecker is correctly imported for permission integration.


165-186: LGTM! Tool invoker correctly threaded through execution flow.

Creating the permission-aware invoker once at the start of run() ensures consistent permission enforcement throughout execution. Per context snippet 1, identity.tools is always present (has default_factory=ToolPermissions), so _make_tool_invoker can safely access it.


208-221: LGTM! Method signature correctly extended.

The tool_invoker parameter is properly typed as ToolInvoker | None and threaded through to the execution loop.


332-369: LGTM! Context preparation correctly uses permission-filtered tools.

The _get_tool_definitions(tool_invoker) call at line 344 ensures only permitted tools are included in the system prompt. The use of format_task_instruction from prompt.py aligns with the refactoring mentioned in the PR summary.


449-456: LGTM! Tool definitions now permission-filtered.

The method correctly delegates to tool_invoker.get_permitted_definitions() which uses the permission checker to filter definitions. The empty tuple fallback for None invoker maintains consistency.


556-564: LGTM! Tool invoker factory creates permission-aware invoker.

The method correctly creates a ToolPermissionChecker from the agent's ToolPermissions and passes it to the ToolInvoker. This enables both prompt-time filtering and runtime enforcement (defense-in-depth).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an access-level–based tool authorization layer across the tool system and execution engine, so agents only see and can invoke tools they’re permitted to use.

Changes:

  • Added ToolPermissionChecker with ToolAccessLevel/ToolCategory and integrated it into ToolInvoker (prompt filtering + invocation-time enforcement).
  • Extended BaseTool with a category attribute and wired permission-aware tool invokers into AgentEngine/ReactLoop.
  • Added unit/integration test coverage for permission resolution, filtering, and denied-invocation behavior.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/tools/test_permissions.py New unit tests for permission resolution (levels, allow/deny lists, filtering).
tests/unit/tools/test_invoker.py Adds invoker-level permission check tests and definition filtering tests.
tests/unit/tools/conftest.py Adds categorized test tools + fixtures for permission-aware invoker/registry.
tests/unit/engine/test_run_result.py Updates to use format_task_instruction after helper extraction.
tests/unit/engine/test_agent_engine.py Updates tool construction to include ToolCategory for permission model.
tests/integration/engine/test_agent_engine_integration.py Adds E2E integration test for permission-denied tool call behavior.
src/ai_company/tools/permissions.py New ToolPermissionChecker implementation + observability events.
src/ai_company/tools/invoker.py Adds optional permission checker + filters tool definitions for prompts + checks at invoke time.
src/ai_company/tools/examples/echo.py Assigns a ToolCategory to the example tool.
src/ai_company/tools/errors.py Adds ToolPermissionDeniedError to the tool error hierarchy.
src/ai_company/tools/base.py Adds BaseTool.category (with default) for permission gating.
src/ai_company/tools/init.py Exposes new permission-related exports.
src/ai_company/observability/events/tool.py Adds tool permission event constants.
src/ai_company/engine/react_loop.py Uses permitted tool definitions from invoker when calling the provider.
src/ai_company/engine/prompt.py Extracts format_task_instruction helper into prompt module.
src/ai_company/engine/agent_engine.py Creates permission-aware invoker per run; filters tools in prompt; passes invoker into loop.
src/ai_company/core/enums.py Adds ToolAccessLevel and ToolCategory enums.
src/ai_company/core/agent.py Adds ToolPermissions.access_level field.
src/ai_company/core/init.py Re-exports new enums.
DESIGN_SPEC.md Documents the permission model and project structure updates.
Comments suppressed due to low confidence (1)

src/ai_company/tools/examples/echo.py:35

  • EchoTool is categorized as ToolCategory.OTHER, but ToolPermissionChecker denies OTHER at the default ToolAccessLevel.STANDARD. As a result, the example echo tool will not appear in prompts and will be denied at invocation time for typical agents unless it’s explicitly allow-listed or the agent is ELEVATED. Consider assigning EchoTool a category that is permitted at STANDARD (or adjust the level/category mapping if OTHER is intended to be usable by default).
    def __init__(self) -> None:
        """Initialize the echo tool with a fixed schema."""
        super().__init__(
            name="echo",
            description="Echoes the input message back",
            category=ToolCategory.OTHER,
            parameters_schema={
                "type": "object",
                "properties": {"message": {"type": "string"}},
                "required": ["message"],
                "additionalProperties": False,
            },

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

name: str,
description: str = "",
parameters_schema: dict[str, Any] | None = None,
category: ToolCategory = ToolCategory.OTHER,
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseTool defaults category to ToolCategory.OTHER, but ToolPermissionChecker denies OTHER for all non-ELEVATED access levels. With permission checking enabled by default in AgentEngine, any tool implementation that doesn’t explicitly set category will be silently unavailable to STANDARD agents. Consider either (a) requiring an explicit category (no default), (b) choosing a default category that is permitted at STANDARD, or (c) including OTHER in the STANDARD allowed categories to preserve backwards compatibility for existing tools that don’t set a category.

Suggested change
category: ToolCategory = ToolCategory.OTHER,
category: ToolCategory,

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR implements category-level tool permission gating (ToolPermissionChecker) with a 5-level access hierarchy (sandboxed → restricted → standard → elevated + custom), wires it into AgentIdentity.tools via ToolPermissions.access_level, and integrates defense-in-depth checking into both the LLM prompt-building stage (filter_definitions) and the tool invocation stage (ToolInvoker._check_permission). The design is clean, the priority-based resolution (denied → allowed → level categories → deny) is correct, and the test coverage is comprehensive.

Two concerns remain:

  1. ELEVATED auto-includes all future ToolCategory members — using frozenset(ToolCategory) means any newly added category is automatically elevated-access with no required code change, test failure, or review gate. A curated set or assertion test would make future category additions a deliberate, visible decision.

  2. denial_reason lacks an _allowed guard — the method is documented as requiring a prior is_permitted call, but it would produce a misleading category-based message if called on a tool that is actually in the allowed list. Adding a guard would make the API robust against misuse.

Confidence Score: 4/5

  • PR is safe to merge; remaining concerns are design-level improvements, not structural blockers.
  • The core permission system is correct and well-tested. The TYPE_CHECKING import pattern (previously flagged as a blocker) is safe under Python 3.14+ which implements PEP 649 deferred annotations by default — no runtime NameError risk. Two design-level concerns remain: (1) the ELEVATED level's auto-inclusion of future categories should have an explicit review gate, and (2) denial_reason should guard against being called on permitted tools. Both are straightforward to address and do not affect the security or correctness of the current implementation.
  • src/ai_company/tools/permissions.py requires targeted improvements: add an assertion test or curated set for ELEVATED category membership, and add a guard clause in denial_reason for the allowed list.

Sequence Diagram

sequenceDiagram
    participant AE as AgentEngine.run()
    participant MTI as _make_tool_invoker()
    participant TPC as ToolPermissionChecker
    participant PC as _prepare_context()
    participant RL as ReactLoop
    participant TI as ToolInvoker.invoke()

    AE->>MTI: identity.tools (ToolPermissions)
    MTI->>TPC: from_permissions(permissions)
    TPC-->>MTI: checker (access_level, allowed, denied)
    MTI-->>AE: ToolInvoker(registry, permission_checker)

    AE->>PC: tool_invoker
    PC->>TI: get_permitted_definitions()
    TI->>TPC: filter_definitions(registry)
    TPC-->>TI: permitted ToolDefinitions (sorted)
    TI-->>PC: tuple[ToolDefinition, ...]
    Note over PC: Filtered defs used in system prompt

    AE->>RL: tool_invoker
    RL->>TI: get_permitted_definitions()
    TI->>TPC: filter_definitions(registry)
    TPC-->>TI: permitted ToolDefinitions
    TI-->>RL: list[ToolDefinition] for LLM API call

    RL->>TI: invoke(tool_call)
    TI->>TPC: is_permitted(tool_name, category)
    alt Permitted
        TPC-->>TI: True
        TI->>TI: validate params → execute
        TI-->>RL: ToolResult(is_error=False)
    else Denied
        TPC-->>TI: False
        TI->>TPC: denial_reason(tool_name, category)
        TPC-->>TI: reason string
        TI-->>RL: ToolResult(is_error=True, "Permission denied: …")
    end
Loading

Last reviewed commit: 63f8b23

super().__init__(
name="echo",
description="Echoes the input message back",
category=ToolCategory.OTHER,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference tool silently denied at default access level

EchoTool is documented as "A minimal reference implementation… useful for testing and as a starting point for new tool implementations." However, it uses ToolCategory.OTHER, which is not included in the SANDBOXED, RESTRICTED, or STANDARD _LEVEL_CATEGORIES sets. Since ToolAccessLevel.STANDARD is the default for agents (ToolPermissions.access_level), an EchoTool registered with a default-configured agent will be silently filtered out of get_permitted_definitions() and any direct invocations will return ToolResult(is_error=True, content="Permission denied: …").

This is especially problematic because the EchoTool is the stated reference implementation — developers following its pattern and using ToolCategory.OTHER (or omitting the category argument entirely, since the BaseTool default is also OTHER) will observe their tools disappearing at runtime with no compile-time or startup-time warning.

Consider either:

  1. Changing EchoTool.category to a category included in the standard levels (e.g. ToolCategory.CODE_EXECUTION), or
  2. Adding a comment explaining why OTHER is intentional (e.g. this tool is only meant for elevated/custom-level agents or direct testing without a checker).
Suggested change
category=ToolCategory.OTHER,
category=ToolCategory.CODE_EXECUTION,
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/examples/echo.py
Line: 29

Comment:
**Reference tool silently denied at default access level**

`EchoTool` is documented as "A minimal reference implementation… useful for testing and as a starting point for new tool implementations." However, it uses `ToolCategory.OTHER`, which is **not included** in the `SANDBOXED`, `RESTRICTED`, or `STANDARD` `_LEVEL_CATEGORIES` sets. Since `ToolAccessLevel.STANDARD` is the default for agents (`ToolPermissions.access_level`), an `EchoTool` registered with a default-configured agent will be silently filtered out of `get_permitted_definitions()` and any direct invocations will return `ToolResult(is_error=True, content="Permission denied: …")`.

This is especially problematic because the EchoTool is the stated reference implementation — developers following its pattern and using `ToolCategory.OTHER` (or omitting the `category` argument entirely, since the `BaseTool` default is also `OTHER`) will observe their tools disappearing at runtime with no compile-time or startup-time warning.

Consider either:
1. Changing `EchoTool.category` to a category included in the standard levels (e.g. `ToolCategory.CODE_EXECUTION`), or
2. Adding a comment explaining why `OTHER` is intentional (e.g. this tool is only meant for elevated/custom-level agents or direct testing without a checker).

```suggestion
            category=ToolCategory.CODE_EXECUTION,
```

How can I resolve this? If you propose a fix, please make it concise.

…, Copilot, and Greptile

- Make BaseTool.category required (no default) to prevent silent denial footgun
  (flagged by Copilot, Greptile x2, CodeRabbit)
- Add MemoryError/RecursionError re-raise guard in _log_completion try/except
  (flagged by silent-failure-hunter agent)
- Wrap _LEVEL_CATEGORIES in MappingProxyType for immutability compliance
  (flagged by type-design-analyzer agent)
- Add safety comment on ELEVATED frozenset(ToolCategory) auto-include behavior
- Add test for ToolPermissionDeniedError.context dict correctness
- Fix DESIGN_SPEC base.py comment (ToolCategory is in core/enums.py, not base.py)
- Fix DESIGN_SPEC routing strategies count (5, not 6)
- Add ToolAccessLevel docstring cross-reference to _LEVEL_CATEGORIES
- Improve denial_reason docstring clarity
- Verify denial reason content in invoker permission test
- Update all test tool subclasses with explicit category parameter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 833c190 into main Mar 7, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the feat/tool-permission-checking branch March 7, 2026 07:32
Comment on lines +82 to +84
# all categories — new ToolCategory members are auto-included;
# review new categories with ELEVATED access in mind
ToolAccessLevel.ELEVATED: frozenset(ToolCategory),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using frozenset(ToolCategory) for the ELEVATED level means any new category member added to the enum in the future will automatically gain ELEVATED access without explicit review or test failure. Consider replacing with a curated set:

# Option 1: Explicit set (requires deliberate update)
ToolAccessLevel.ELEVATED: frozenset({
    ToolCategory.DATABASE,
    ToolCategory.DEPLOYMENT,
    # Add new high-privilege categories here
}),

# Option 2: Assertion test that catches new categories
assert frozenset(ToolCategory) == _LEVEL_CATEGORIES[ToolAccessLevel.ELEVATED]

This ensures new sensitive categories (e.g., SECRETS_MANAGER) are reviewed before automatic elevation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/permissions.py
Line: 82-84

Comment:
Using `frozenset(ToolCategory)` for the `ELEVATED` level means any new category member added to the enum in the future will automatically gain ELEVATED access without explicit review or test failure. Consider replacing with a curated set:

```python
# Option 1: Explicit set (requires deliberate update)
ToolAccessLevel.ELEVATED: frozenset({
    ToolCategory.DATABASE,
    ToolCategory.DEPLOYMENT,
    # Add new high-privilege categories here
}),

# Option 2: Assertion test that catches new categories
assert frozenset(ToolCategory) == _LEVEL_CATEGORIES[ToolAccessLevel.ELEVATED]
```

This ensures new sensitive categories (e.g., SECRETS_MANAGER) are reviewed before automatic elevation.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +173 to +198
def denial_reason(self, tool_name: str, category: ToolCategory) -> str:
"""Return a human-readable reason why a tool would be denied.

Intended for use after confirming the tool is denied via
``is_permitted`` or via ``check``. If the tool is actually
permitted, the returned string does not apply and should not
be shown to users.

Args:
tool_name: Name of the tool.
category: Category of the tool.

Returns:
Explanation string suitable for error messages.
"""
name_lower = tool_name.strip().casefold()
if name_lower in self._denied:
return f"Tool {tool_name!r} is explicitly denied"
if self._access_level == ToolAccessLevel.CUSTOM:
return (
f"Tool {tool_name!r} is not in the allowed list (access level: custom)"
)
return (
f"Category {category.value!r} is not permitted "
f"at access level {self._access_level.value!r}"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The denial_reason method can produce misleading output if called directly on a tool in the allowed list. The docstring documents this as "Intended for use after confirming the tool is denied," but there is no guard to enforce it. If a caller forgets the precondition, the third branch returns an incorrect category-based message.

Consider adding a defensive check:

def denial_reason(self, tool_name: str, category: ToolCategory) -> str:
    name_lower = tool_name.strip().casefold()
    if name_lower in self._denied:
        return f"Tool {tool_name!r} is explicitly denied"
    if name_lower in self._allowed:
        # Guard: tool is actually permitted, should not reach here
        return f"Tool {tool_name!r} is explicitly allowed"
    if self._access_level == ToolAccessLevel.CUSTOM:
        return f"Tool {tool_name!r} is not in the allowed list (access level: custom)"
    return (
        f"Category {category.value!r} is not permitted "
        f"at access level {self._access_level.value!r}"
    )

This makes the method robust against misuse and removes the implicit precondition.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/permissions.py
Line: 173-198

Comment:
The `denial_reason` method can produce misleading output if called directly on a tool in the allowed list. The docstring documents this as "Intended for use after confirming the tool is denied," but there is no guard to enforce it. If a caller forgets the precondition, the third branch returns an incorrect category-based message.

Consider adding a defensive check:

```python
def denial_reason(self, tool_name: str, category: ToolCategory) -> str:
    name_lower = tool_name.strip().casefold()
    if name_lower in self._denied:
        return f"Tool {tool_name!r} is explicitly denied"
    if name_lower in self._allowed:
        # Guard: tool is actually permitted, should not reach here
        return f"Tool {tool_name!r} is explicitly allowed"
    if self._access_level == ToolAccessLevel.CUSTOM:
        return f"Tool {tool_name!r} is not in the allowed list (access level: custom)"
    return (
        f"Category {category.value!r} is not permitted "
        f"at access level {self._access_level.value!r}"
    )
```

This makes the method robust against misuse and removes the implicit precondition.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement tool permission checking based on role and access level

2 participants