feat: implement tool permission checking (#16)#147
Conversation
Add access-level gating for tool invocations per DESIGN_SPEC §11.2. Permission resolution uses priority order: denied list > allowed list > access level categories. Progressive trust is disabled for M3 (static access levels only). New types: ToolAccessLevel (5 levels), ToolCategory (12 categories), ToolPermissionChecker, ToolPermissionDeniedError. Permission checking integrates at both prompt filtering (LLM sees only permitted tools) and runtime enforcement (belt-and-suspenders deny on invoke). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 10 agents, 18 findings addressed: - Fix integration tests missing ToolCategory on test tools - Add permission-denied integration test (E2E denial path) - Add filter_definitions tests (denied list, sort order) - Fix invoke() docstring to include permission-check step - Fix ToolAccessLevel docstring re: CUSTOM level semantics - Make filter_definitions sort explicit + add DEBUG log - Change .get() to direct dict access for fail-loud on unmapped levels - Fix denial_reason/check_permission docstrings - Extract format_task_instruction to prompt.py (agent_engine < 800 lines) - Update DESIGN_SPEC.md: §3.1 access_level, §11.1.1 permission model, §11.2 M3 scope note, §15.3 project structure, §15.5 conventions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds category- and access-level-based tool permissioning: new enums (ToolAccessLevel, ToolCategory), a ToolPermissionChecker, permission-aware ToolInvoker flow (pre-validation checks and filtering), BaseTool category support, observability events, and tests/integration wiring to enforce and surface permission denials. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as AgentEngine
participant Invoker as ToolInvoker
participant Checker as ToolPermissionChecker
participant Registry as ToolRegistry
participant Tool as BaseTool
Agent->>Invoker: new(registry, permission_checker)
Agent->>Invoker: get_permitted_definitions()
Invoker->>Checker: filter_definitions(registry)
Checker-->>Invoker: permitted ToolDefinitions
Agent->>Invoker: invoke(tool_call)
Invoker->>Checker: _check_permission(tool_name, category)
alt denied
Checker-->>Invoker: raises/returns denial
Invoker-->>Agent: ToolResult(error=ToolPermissionDeniedError)
else allowed
Invoker->>Tool: validate params / execute
Tool-->>Invoker: ToolResult(success)
Invoker-->>Agent: ToolResult
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a robust tool permission checking system, enhancing the security and control over which tools AI agents can access and execute. By introducing explicit access levels, tool categorization, and a dedicated permission checker, the system ensures that agents only interact with approved functionalities, both at the prompt generation stage and during actual tool invocation. This provides a layered defense mechanism, preventing unauthorized tool usage and improving the overall reliability and safety of agent operations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive tool permission checking system, defining ToolAccessLevel and ToolCategory enums and integrating them into the AgentIdentity and ToolPermissions models. A new ToolPermissionChecker class is implemented to enforce these permissions, using a priority-based system of explicit deny/allow lists and hierarchical access levels (sandboxed, restricted, standard, elevated, custom) based on tool categories. The ToolInvoker now utilizes this checker to filter tool definitions presented to the LLM and to validate tool calls at invocation time, ensuring agents only use permitted tools. The AgentEngine has been updated to incorporate this new permission logic, including moving the format_task_instruction helper to prompt.py. Review comments identify a critical bug in agent_engine.py where RecursionError is not properly caught due to outdated Python 2 exception syntax, a prompt injection vulnerability in format_task_instruction due to direct embedding of untrusted task content, and suggest improving maintainability in ToolPermissionChecker by defining hierarchical permission sets incrementally.
| timeout_seconds=timeout_seconds, | ||
| tool_invoker=tool_invoker, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The code uses the outdated Python 2 syntax for catching multiple exceptions: except MemoryError, RecursionError:. In Python 3, this is interpreted as except MemoryError as RecursionError:, which means it only catches MemoryError and assigns the exception object to the variable name RecursionError. Consequently, RecursionError is NOT caught by this block. This violates the design principle that RecursionError is a non-recoverable error that should be re-raised. Instead, it will be caught by subsequent except Exception blocks and handled as a recoverable error, which could lead to inconsistent system state.
except (MemoryError, RecursionError):| Returns: | ||
| Markdown-formatted task instruction string. | ||
| """ | ||
| parts = [f"# Task: {task.title}", "", task.description] |
There was a problem hiding this comment.
Untrusted task content (task.title and task.description) is directly included in the LLM prompt without proper sanitization or delimiting. An attacker who can control the task metadata could inject malicious instructions to manipulate the agent's behavior (Prompt Injection). It is recommended to use clear delimiters (e.g., XML tags) and instructions to the LLM to treat the content as data.
src/ai_company/tools/permissions.py
Outdated
| _LEVEL_CATEGORIES: ClassVar[dict[ToolAccessLevel, frozenset[ToolCategory]]] = { | ||
| ToolAccessLevel.SANDBOXED: frozenset( | ||
| { | ||
| ToolCategory.FILE_SYSTEM, | ||
| ToolCategory.CODE_EXECUTION, | ||
| ToolCategory.VERSION_CONTROL, | ||
| } | ||
| ), | ||
| ToolAccessLevel.RESTRICTED: frozenset( | ||
| { | ||
| ToolCategory.FILE_SYSTEM, | ||
| ToolCategory.CODE_EXECUTION, | ||
| ToolCategory.VERSION_CONTROL, | ||
| ToolCategory.WEB, | ||
| } | ||
| ), | ||
| ToolAccessLevel.STANDARD: frozenset( | ||
| { | ||
| ToolCategory.FILE_SYSTEM, | ||
| ToolCategory.CODE_EXECUTION, | ||
| ToolCategory.VERSION_CONTROL, | ||
| ToolCategory.WEB, | ||
| ToolCategory.TERMINAL, | ||
| ToolCategory.ANALYTICS, | ||
| } | ||
| ), | ||
| ToolAccessLevel.ELEVATED: frozenset(ToolCategory), | ||
| ToolAccessLevel.CUSTOM: frozenset(), | ||
| } |
There was a problem hiding this comment.
For better maintainability, you could define the hierarchical permission sets incrementally. This avoids repeating the categories for each access level and reduces the chance of inconsistencies if categories are added or changed in the future.
_SANDBOXED_CATS = frozenset(
{
ToolCategory.FILE_SYSTEM,
ToolCategory.CODE_EXECUTION,
ToolCategory.VERSION_CONTROL,
}
)
_RESTRICTED_CATS = _SANDBOXED_CATS | {ToolCategory.WEB}
_STANDARD_CATS = _RESTRICTED_CATS | {
ToolCategory.TERMINAL,
ToolCategory.ANALYTICS,
}
_LEVEL_CATEGORIES: ClassVar[dict[ToolAccessLevel, frozenset[ToolCategory]]] = {
ToolAccessLevel.SANDBOXED: _SANDBOXED_CATS,
ToolAccessLevel.RESTRICTED: _RESTRICTED_CATS,
ToolAccessLevel.STANDARD: _STANDARD_CATS,
ToolAccessLevel.ELEVATED: frozenset(ToolCategory),
ToolAccessLevel.CUSTOM: frozenset(),
}There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/tools/base.py (1)
70-77:⚠️ Potential issue | 🟠 MajorMake tool categorization explicit instead of defaulting to
OTHER.The
categoryparameter defaults toToolCategory.OTHER, and at least 13 existingBaseToolsubclasses in the codebase do not passcategory=tosuper().__init__(). These tools will silently fall back toOTHER, creating a fail-open permission model where every uncategorized legacy tool is treated as intentionally low-risk. If access-level checks permitOTHER, this becomes a security boundary violation. Require an explicit category for every tool before shipping.Suggested change
def __init__( self, *, name: str, description: str = "", parameters_schema: dict[str, Any] | None = None, - category: ToolCategory = ToolCategory.OTHER, + category: ToolCategory, ) -> None:Also applies to: 93-96, 107-110
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/tools/base.py` around lines 70 - 77, The BaseTool constructor currently defaults category to ToolCategory.OTHER which causes silent, unsafe fallbacks; modify BaseTool.__init__ to require category (remove the default), and add a defensive runtime check (raise ValueError) if category is missing or None so instantiations fail loudly; update all BaseTool subclasses and any calls that relied on the default to pass an explicit category argument to super().__init__; also apply the same change to the other BaseTool constructor overloads/variants (the other __init__ signatures around the class) so all entry points enforce explicit ToolCategory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/ai_company/tools/base.py`:
- Around line 70-77: The BaseTool constructor currently defaults category to
ToolCategory.OTHER which causes silent, unsafe fallbacks; modify
BaseTool.__init__ to require category (remove the default), and add a defensive
runtime check (raise ValueError) if category is missing or None so
instantiations fail loudly; update all BaseTool subclasses and any calls that
relied on the default to pass an explicit category argument to super().__init__;
also apply the same change to the other BaseTool constructor overloads/variants
(the other __init__ signatures around the class) so all entry points enforce
explicit ToolCategory.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7d354dca-9812-45d9-a126-3e5c49c54331
📒 Files selected for processing (20)
DESIGN_SPEC.mdsrc/ai_company/core/__init__.pysrc/ai_company/core/agent.pysrc/ai_company/core/enums.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/prompt.pysrc/ai_company/engine/react_loop.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/__init__.pysrc/ai_company/tools/base.pysrc/ai_company/tools/errors.pysrc/ai_company/tools/examples/echo.pysrc/ai_company/tools/invoker.pysrc/ai_company/tools/permissions.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/test_agent_engine.pytests/unit/engine/test_run_result.pytests/unit/tools/conftest.pytests/unit/tools/test_invoker.pytests/unit/tools/test_permissions.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT usefrom __future__ import annotations—Python 3.14 has PEP 649
Useexcept A, B:syntax (no parentheses) for exception handling on Python 3.14—ruff enforces this
Add type hints to all public functions in Python; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions—ruff D rules enforce this
Create new objects instead of mutating existing ones; usecopy.deepcopy()at construction for non-Pydantic internal collections andMappingProxyTypewrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 withBaseModel,model_validator,computed_field, andConfigDict
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over barecreate_task
Enforce line length of 88 characters (ruff enforces this)
Functions should be less than 50 lines, files less than 800 lines
Handle errors explicitly; never silently swallow errors in Python code
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/tools/errors.pysrc/ai_company/core/__init__.pysrc/ai_company/tools/permissions.pysrc/ai_company/tools/__init__.pysrc/ai_company/engine/prompt.pysrc/ai_company/core/agent.pytests/unit/tools/test_permissions.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/tools/invoker.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/base.pytests/unit/engine/test_run_result.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/tools/examples/echo.pytests/unit/tools/test_invoker.pytests/unit/engine/test_agent_engine.pytests/unit/tools/conftest.pysrc/ai_company/engine/react_loop.pysrc/ai_company/core/enums.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic MUST importfrom ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code
Always useloggeras the variable name for loggers (not_logger, notlog)
Use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging withlogger.info(EVENT, key=value)format—neverlogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging should be used for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Files:
src/ai_company/tools/errors.pysrc/ai_company/core/__init__.pysrc/ai_company/tools/permissions.pysrc/ai_company/tools/__init__.pysrc/ai_company/engine/prompt.pysrc/ai_company/core/agent.pysrc/ai_company/tools/invoker.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/base.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/tools/examples/echo.pysrc/ai_company/engine/react_loop.pysrc/ai_company/core/enums.py
{src/**/*.py,tests/**/*.py,src/**/*.yaml,src/**/*.yml,tests/**/*.yaml,tests/**/*.yml,examples/**/*.yaml,examples/**/*.yml}
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/skill/agent files, (3) third-party import paths/module names (e.g.litellm.types.llms.openai). Tests must usetest-provider,test-small-001, etc.
Files:
src/ai_company/tools/errors.pysrc/ai_company/core/__init__.pysrc/ai_company/tools/permissions.pysrc/ai_company/tools/__init__.pysrc/ai_company/engine/prompt.pysrc/ai_company/core/agent.pytests/unit/tools/test_permissions.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/tools/invoker.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/base.pytests/unit/engine/test_run_result.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/tools/examples/echo.pytests/unit/tools/test_invoker.pytests/unit/engine/test_agent_engine.pytests/unit/tools/conftest.pysrc/ai_company/engine/react_loop.pysrc/ai_company/core/enums.py
src/ai_company/{providers,engine}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed—the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/react_loop.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark unit tests with@pytest.mark.unit, integration tests with@pytest.mark.integration, e2e tests with@pytest.mark.e2e, and slow tests with@pytest.mark.slow
Useasyncio_mode = 'auto'for pytest async tests—no manual@pytest.mark.asyncioneeded
Set a 30-second timeout per test
Files:
tests/unit/tools/test_permissions.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/test_run_result.pytests/unit/tools/test_invoker.pytests/unit/engine/test_agent_engine.pytests/unit/tools/conftest.py
🧠 Learnings (3)
📚 Learning: 2026-03-06T21:51:55.175Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T21:51:55.175Z
Learning: Applies to **/*.py : Use Pydantic v2 with `BaseModel`, `model_validator`, `computed_field`, and `ConfigDict`
Applied to files:
src/ai_company/tools/base.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_run_result.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use BaseAgent methods for Ollama integration: `self.generate(prompt)` for LLM calls with built-in retry logic, `self.settings` for model configuration, and `self.client` for Ollama client instance
Applied to files:
tests/unit/engine/test_run_result.py
🔇 Additional comments (45)
src/ai_company/tools/errors.py (1)
59-60: Nice addition of a dedicated permission error.A distinct
ToolPermissionDeniedErrormakes the denied path much easier to handle explicitly in the invoker, engine, and tests.src/ai_company/engine/prompt.py (1)
641-665: Good extraction.Pulling task-message rendering into a dedicated public helper keeps the prompt-building path smaller and makes this formatting logic easier to test in isolation.
src/ai_company/observability/events/tool.py (1)
22-24: Good observability hooks.These events cover the important permission checkpoints and should make denied/filtering behavior much easier to trace in production logs.
tests/unit/engine/test_run_result.py (1)
19-20: Good coverage for the extracted formatter.The updated tests exercise the main output combinations around criteria, budget, and deadline formatting, which makes the API move low-risk.
Also applies to: 201-205, 213-214, 221-221, 241-242, 262-262, 285-285
src/ai_company/tools/examples/echo.py (1)
9-9: Nice tightening of the example tool.Adding an explicit category and making the schema strict keeps the example aligned with the new permission model and avoids silently accepting extra input.
Also applies to: 29-35
src/ai_company/core/agent.py (1)
141-152: Clean config-model extension.Adding
access_leveldirectly toToolPermissionsfits the new permission model well and keeps the static tool policy on the frozen identity card where it belongs.tests/unit/tools/conftest.py (1)
407-463: Useful permission-test scaffolding.The categorized fixture registry makes the access-level tests much easier to read and should keep follow-up permission cases cheap to add.
src/ai_company/engine/react_loop.py (1)
409-416: LGTM!The change correctly integrates permission-aware tool definition retrieval. The
get_permitted_definitions()method handles both permission-filtered and unfiltered cases appropriately, and the docstring accurately reflects the new behavior.tests/unit/engine/test_agent_engine.py (1)
296-316: LGTM!The test correctly adds the required
categoryparameter to align with the updatedBaseToolsignature. UsingToolCategory.CODE_EXECUTIONis appropriate for this test tool, and the lazy import pattern keeps the test self-contained.src/ai_company/core/__init__.py (1)
36-37: LGTM!The new enums
ToolAccessLevelandToolCategoryare correctly imported and exported, maintaining alphabetical ordering in__all__.Also applies to: 98-99
tests/unit/tools/test_invoker.py (2)
671-722: LGTM!The
TestInvokerPermissionCheckclass provides comprehensive coverage for permission checking behavior:
- Verifies no-checker allows all tools
- Validates denied tools return proper error results with "Permission denied" message
- Confirms permitted tools execute successfully
- Tests mixed permission scenarios in
invoke_all
724-752: LGTM!The
TestGetPermittedDefinitionsclass correctly tests the definition filtering behavior. The inline comments documenting which categoriesSTANDARDallows/denies improve test readability.tests/integration/engine/test_agent_engine_integration.py (2)
181-182: LGTM!The existing integration tests are correctly updated to include the
categoryparameter, aligning with the updatedBaseToolsignature.Also applies to: 262-263
308-379: LGTM!This integration test effectively validates the permission denial flow:
- Creates a
SANDBOXEDagent attempting to use aDEPLOYMENTcategory tool- Verifies the engine completes successfully (
is_success=True) while the tool result contains the permission denial error- Confirms defense-in-depth: the tool exists in the registry but is blocked at invocation time
The test correctly demonstrates that permission denials are recoverable errors returned to the LLM, not fatal execution failures.
src/ai_company/tools/__init__.py (1)
1-27: LGTM!The module correctly exports the new permission-related types (
ToolPermissionChecker,ToolPermissionDeniedError) and updates the docstring to reflect the expanded scope. The__all__list maintains alphabetical ordering.src/ai_company/core/enums.py (2)
193-207: LGTM!
ToolAccessLevelis well-designed with clear hierarchical semantics documented. The five levels (SANDBOXEDthroughELEVATED) form a hierarchy, whileCUSTOMcorrectly bypasses hierarchy for explicit allow/deny list control.
210-224: LGTM!
ToolCategoryprovides comprehensive coverage of tool types. The 12 categories align with common agent tooling patterns, andOTHERprovides a sensible default for uncategorized tools.src/ai_company/tools/invoker.py (4)
83-97: LGTM!The constructor correctly accepts an optional
permission_checkerparameter as keyword-only, providing clean API design. The docstring accurately describes the behavior whenNoneis passed.
104-114: LGTM!
get_permitted_definitionscleanly delegates to the permission checker'sfilter_definitionsmethod when available, falling back to returning all definitions. This supports both the prompt-time filtering use case (React loop) and maintains backward compatibility.
116-141: LGTM!The
_check_permissionmethod implements defense-in-depth correctly:
- Returns
Nonewhen permitted (fast path)- Logs at WARNING level with structured event data on denial
- Returns a
ToolResultwithis_error=Trueand a human-readable denial reasonThe logging includes all relevant context (
tool_call_id,tool_name,reason) for observability.
143-184: LGTM!The updated
invokemethod correctly integrates permission checking:
- Docstring updated to reflect the 5-step flow (lookup → permissions → validation → execute → return)
- Permission check (Lines 172-174) is positioned after tool lookup (needed to access
tool.category) but before parameter validation- Early return on permission denial prevents unnecessary validation/execution
This implements the runtime enforcement half of the defense-in-depth strategy.
tests/unit/tools/test_permissions.py (7)
1-14: LGTM! Well-structured test module with proper markers.The file correctly sets the 30-second timeout per test and uses
@pytest.mark.uniton all test classes as required by coding guidelines.
20-41: LGTM! Clean minimal test tool implementation.The
_SimpleToolhelper is appropriately scoped for testing and correctly implements theBaseToolinterface with configurable name and category.
46-150: LGTM! Comprehensive access level coverage.The test classes provide thorough coverage of each access level's category permissions, testing both allowed and denied categories. The test structure clearly documents the expected behavior of each level.
152-231: LGTM! Priority resolution tests are well-designed.Tests correctly verify the priority chain (denied → allowed → level categories → deny) with clear documentation. The "belt-and-suspenders" test at line 166-173 explicitly validates that denied always wins, even when a tool appears in both lists.
236-264: LGTM! Denial reason tests verify user-facing messages.Good coverage of the three denial scenarios with appropriate substring checks to ensure messages contain relevant context without being brittle to exact wording.
269-353: LGTM! Filter definitions tests are thorough.Good coverage including empty registry edge case, allowed/denied list interaction, and the important sort-order verification at lines 339-352.
358-414: LGTM! Edge case coverage is excellent.The whitespace normalization test (lines 409-413) correctly verifies that names are normalized at storage time, ensuring consistent matching regardless of leading/trailing whitespace in the configuration.
DESIGN_SPEC.md (3)
1645-1651: LGTM! Permission checking documentation is accurate and clear.The documentation correctly describes the dual-integration approach (prompt-time filtering + runtime enforcement) and the priority-based permission resolution. The defense-in-depth rationale is well explained.
1750-1751: LGTM! Clear scope delineation for M3.The note explicitly clarifies that M3 implements category-level gating only, with granular sub-constraints (workspace scope, network mode) planned for when sandboxing is implemented. This helps set appropriate expectations.
2429-2429: LGTM! Engineering conventions updated.The tool permission checking convention is well documented with the priority-based resolution rules and integration points clearly specified.
src/ai_company/tools/permissions.py (8)
1-29: LGTM! Module setup follows all conventions.Proper use of
get_logger(__name__), TYPE_CHECKING for lazy imports, and clear module docstring documenting the priority-based resolution rules.
51-79: LGTM! Level-to-category mapping is well-structured.The
frozenset(ToolCategory)idiom for ELEVATED correctly includes all enum members. CUSTOM's empty set enforces the allowed-list-only behavior as designed.
81-103: LGTM! Constructor correctly normalizes and stores configuration.The
strip().casefold()normalization at storage time (lines 96-97) ensures case-insensitive matching works correctly. Defaulting toToolAccessLevel.STANDARDaligns withToolPermissions.access_leveldefault.
105-119: LGTM! Factory method correctly converts ToolPermissions.The conversion from tuple to frozenset for allowed/denied is necessary since
ToolPermissionsuses tuples (per context snippet 1). TheSelfreturn type ensures proper type inference for subclasses.
121-139: LGTM! Permission check implements priority correctly.The method follows the documented resolution order: denied → allowed → CUSTOM denial → level categories. The normalization at lookup time (line 131) matches the storage normalization for consistent matching.
141-162: LGTM! Check method follows error handling guidelines.Correctly logs at WARNING level with structured context before raising
ToolPermissionDeniedError. The context dict provides useful debugging information.
164-188: LGTM! Denial reasons are clear and informative.The three denial cases produce distinct, actionable messages. The docstring appropriately warns that calling this on a permitted tool returns a meaningless result.
190-215: LGTM! Filter definitions correctly uses registry and tool categories.The method properly retrieves each tool's category from the
BaseTool.categoryproperty (per context snippet 2) and filters based on permission. The explicit sort at line 205 ensures deterministic output regardless of iteration order changes.src/ai_company/engine/agent_engine.py (6)
23-27: LGTM! Import updates are correct.The
format_task_instructionimport fromprompt.pyreplaces the previously internal helper, andToolPermissionCheckeris correctly imported for permission integration.
165-186: LGTM! Tool invoker correctly threaded through execution flow.Creating the permission-aware invoker once at the start of
run()ensures consistent permission enforcement throughout execution. Per context snippet 1,identity.toolsis always present (hasdefault_factory=ToolPermissions), so_make_tool_invokercan safely access it.
208-221: LGTM! Method signature correctly extended.The
tool_invokerparameter is properly typed asToolInvoker | Noneand threaded through to the execution loop.
332-369: LGTM! Context preparation correctly uses permission-filtered tools.The
_get_tool_definitions(tool_invoker)call at line 344 ensures only permitted tools are included in the system prompt. The use offormat_task_instructionfromprompt.pyaligns with the refactoring mentioned in the PR summary.
449-456: LGTM! Tool definitions now permission-filtered.The method correctly delegates to
tool_invoker.get_permitted_definitions()which uses the permission checker to filter definitions. The empty tuple fallback forNoneinvoker maintains consistency.
556-564: LGTM! Tool invoker factory creates permission-aware invoker.The method correctly creates a
ToolPermissionCheckerfrom the agent'sToolPermissionsand passes it to theToolInvoker. This enables both prompt-time filtering and runtime enforcement (defense-in-depth).
There was a problem hiding this comment.
Pull request overview
Implements an access-level–based tool authorization layer across the tool system and execution engine, so agents only see and can invoke tools they’re permitted to use.
Changes:
- Added
ToolPermissionCheckerwithToolAccessLevel/ToolCategoryand integrated it intoToolInvoker(prompt filtering + invocation-time enforcement). - Extended
BaseToolwith acategoryattribute and wired permission-aware tool invokers intoAgentEngine/ReactLoop. - Added unit/integration test coverage for permission resolution, filtering, and denied-invocation behavior.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tools/test_permissions.py | New unit tests for permission resolution (levels, allow/deny lists, filtering). |
| tests/unit/tools/test_invoker.py | Adds invoker-level permission check tests and definition filtering tests. |
| tests/unit/tools/conftest.py | Adds categorized test tools + fixtures for permission-aware invoker/registry. |
| tests/unit/engine/test_run_result.py | Updates to use format_task_instruction after helper extraction. |
| tests/unit/engine/test_agent_engine.py | Updates tool construction to include ToolCategory for permission model. |
| tests/integration/engine/test_agent_engine_integration.py | Adds E2E integration test for permission-denied tool call behavior. |
| src/ai_company/tools/permissions.py | New ToolPermissionChecker implementation + observability events. |
| src/ai_company/tools/invoker.py | Adds optional permission checker + filters tool definitions for prompts + checks at invoke time. |
| src/ai_company/tools/examples/echo.py | Assigns a ToolCategory to the example tool. |
| src/ai_company/tools/errors.py | Adds ToolPermissionDeniedError to the tool error hierarchy. |
| src/ai_company/tools/base.py | Adds BaseTool.category (with default) for permission gating. |
| src/ai_company/tools/init.py | Exposes new permission-related exports. |
| src/ai_company/observability/events/tool.py | Adds tool permission event constants. |
| src/ai_company/engine/react_loop.py | Uses permitted tool definitions from invoker when calling the provider. |
| src/ai_company/engine/prompt.py | Extracts format_task_instruction helper into prompt module. |
| src/ai_company/engine/agent_engine.py | Creates permission-aware invoker per run; filters tools in prompt; passes invoker into loop. |
| src/ai_company/core/enums.py | Adds ToolAccessLevel and ToolCategory enums. |
| src/ai_company/core/agent.py | Adds ToolPermissions.access_level field. |
| src/ai_company/core/init.py | Re-exports new enums. |
| DESIGN_SPEC.md | Documents the permission model and project structure updates. |
Comments suppressed due to low confidence (1)
src/ai_company/tools/examples/echo.py:35
EchoToolis categorized asToolCategory.OTHER, butToolPermissionCheckerdeniesOTHERat the defaultToolAccessLevel.STANDARD. As a result, the exampleechotool will not appear in prompts and will be denied at invocation time for typical agents unless it’s explicitly allow-listed or the agent is ELEVATED. Consider assigningEchoToola category that is permitted at STANDARD (or adjust the level/category mapping ifOTHERis intended to be usable by default).
def __init__(self) -> None:
"""Initialize the echo tool with a fixed schema."""
super().__init__(
name="echo",
description="Echoes the input message back",
category=ToolCategory.OTHER,
parameters_schema={
"type": "object",
"properties": {"message": {"type": "string"}},
"required": ["message"],
"additionalProperties": False,
},
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/tools/base.py
Outdated
| name: str, | ||
| description: str = "", | ||
| parameters_schema: dict[str, Any] | None = None, | ||
| category: ToolCategory = ToolCategory.OTHER, |
There was a problem hiding this comment.
BaseTool defaults category to ToolCategory.OTHER, but ToolPermissionChecker denies OTHER for all non-ELEVATED access levels. With permission checking enabled by default in AgentEngine, any tool implementation that doesn’t explicitly set category will be silently unavailable to STANDARD agents. Consider either (a) requiring an explicit category (no default), (b) choosing a default category that is permitted at STANDARD, or (c) including OTHER in the STANDARD allowed categories to preserve backwards compatibility for existing tools that don’t set a category.
| category: ToolCategory = ToolCategory.OTHER, | |
| category: ToolCategory, |
Greptile SummaryThis PR implements category-level tool permission gating ( Two concerns remain:
Confidence Score: 4/5
Sequence DiagramsequenceDiagram
participant AE as AgentEngine.run()
participant MTI as _make_tool_invoker()
participant TPC as ToolPermissionChecker
participant PC as _prepare_context()
participant RL as ReactLoop
participant TI as ToolInvoker.invoke()
AE->>MTI: identity.tools (ToolPermissions)
MTI->>TPC: from_permissions(permissions)
TPC-->>MTI: checker (access_level, allowed, denied)
MTI-->>AE: ToolInvoker(registry, permission_checker)
AE->>PC: tool_invoker
PC->>TI: get_permitted_definitions()
TI->>TPC: filter_definitions(registry)
TPC-->>TI: permitted ToolDefinitions (sorted)
TI-->>PC: tuple[ToolDefinition, ...]
Note over PC: Filtered defs used in system prompt
AE->>RL: tool_invoker
RL->>TI: get_permitted_definitions()
TI->>TPC: filter_definitions(registry)
TPC-->>TI: permitted ToolDefinitions
TI-->>RL: list[ToolDefinition] for LLM API call
RL->>TI: invoke(tool_call)
TI->>TPC: is_permitted(tool_name, category)
alt Permitted
TPC-->>TI: True
TI->>TI: validate params → execute
TI-->>RL: ToolResult(is_error=False)
else Denied
TPC-->>TI: False
TI->>TPC: denial_reason(tool_name, category)
TPC-->>TI: reason string
TI-->>RL: ToolResult(is_error=True, "Permission denied: …")
end
Last reviewed commit: 63f8b23 |
| super().__init__( | ||
| name="echo", | ||
| description="Echoes the input message back", | ||
| category=ToolCategory.OTHER, |
There was a problem hiding this comment.
Reference tool silently denied at default access level
EchoTool is documented as "A minimal reference implementation… useful for testing and as a starting point for new tool implementations." However, it uses ToolCategory.OTHER, which is not included in the SANDBOXED, RESTRICTED, or STANDARD _LEVEL_CATEGORIES sets. Since ToolAccessLevel.STANDARD is the default for agents (ToolPermissions.access_level), an EchoTool registered with a default-configured agent will be silently filtered out of get_permitted_definitions() and any direct invocations will return ToolResult(is_error=True, content="Permission denied: …").
This is especially problematic because the EchoTool is the stated reference implementation — developers following its pattern and using ToolCategory.OTHER (or omitting the category argument entirely, since the BaseTool default is also OTHER) will observe their tools disappearing at runtime with no compile-time or startup-time warning.
Consider either:
- Changing
EchoTool.categoryto a category included in the standard levels (e.g.ToolCategory.CODE_EXECUTION), or - Adding a comment explaining why
OTHERis intentional (e.g. this tool is only meant for elevated/custom-level agents or direct testing without a checker).
| category=ToolCategory.OTHER, | |
| category=ToolCategory.CODE_EXECUTION, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/examples/echo.py
Line: 29
Comment:
**Reference tool silently denied at default access level**
`EchoTool` is documented as "A minimal reference implementation… useful for testing and as a starting point for new tool implementations." However, it uses `ToolCategory.OTHER`, which is **not included** in the `SANDBOXED`, `RESTRICTED`, or `STANDARD` `_LEVEL_CATEGORIES` sets. Since `ToolAccessLevel.STANDARD` is the default for agents (`ToolPermissions.access_level`), an `EchoTool` registered with a default-configured agent will be silently filtered out of `get_permitted_definitions()` and any direct invocations will return `ToolResult(is_error=True, content="Permission denied: …")`.
This is especially problematic because the EchoTool is the stated reference implementation — developers following its pattern and using `ToolCategory.OTHER` (or omitting the `category` argument entirely, since the `BaseTool` default is also `OTHER`) will observe their tools disappearing at runtime with no compile-time or startup-time warning.
Consider either:
1. Changing `EchoTool.category` to a category included in the standard levels (e.g. `ToolCategory.CODE_EXECUTION`), or
2. Adding a comment explaining why `OTHER` is intentional (e.g. this tool is only meant for elevated/custom-level agents or direct testing without a checker).
```suggestion
category=ToolCategory.CODE_EXECUTION,
```
How can I resolve this? If you propose a fix, please make it concise.…, Copilot, and Greptile - Make BaseTool.category required (no default) to prevent silent denial footgun (flagged by Copilot, Greptile x2, CodeRabbit) - Add MemoryError/RecursionError re-raise guard in _log_completion try/except (flagged by silent-failure-hunter agent) - Wrap _LEVEL_CATEGORIES in MappingProxyType for immutability compliance (flagged by type-design-analyzer agent) - Add safety comment on ELEVATED frozenset(ToolCategory) auto-include behavior - Add test for ToolPermissionDeniedError.context dict correctness - Fix DESIGN_SPEC base.py comment (ToolCategory is in core/enums.py, not base.py) - Fix DESIGN_SPEC routing strategies count (5, not 6) - Add ToolAccessLevel docstring cross-reference to _LEVEL_CATEGORIES - Improve denial_reason docstring clarity - Verify denial reason content in invoker permission test - Update all test tool subclasses with explicit category parameter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| # all categories — new ToolCategory members are auto-included; | ||
| # review new categories with ELEVATED access in mind | ||
| ToolAccessLevel.ELEVATED: frozenset(ToolCategory), |
There was a problem hiding this comment.
Using frozenset(ToolCategory) for the ELEVATED level means any new category member added to the enum in the future will automatically gain ELEVATED access without explicit review or test failure. Consider replacing with a curated set:
# Option 1: Explicit set (requires deliberate update)
ToolAccessLevel.ELEVATED: frozenset({
ToolCategory.DATABASE,
ToolCategory.DEPLOYMENT,
# Add new high-privilege categories here
}),
# Option 2: Assertion test that catches new categories
assert frozenset(ToolCategory) == _LEVEL_CATEGORIES[ToolAccessLevel.ELEVATED]This ensures new sensitive categories (e.g., SECRETS_MANAGER) are reviewed before automatic elevation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/permissions.py
Line: 82-84
Comment:
Using `frozenset(ToolCategory)` for the `ELEVATED` level means any new category member added to the enum in the future will automatically gain ELEVATED access without explicit review or test failure. Consider replacing with a curated set:
```python
# Option 1: Explicit set (requires deliberate update)
ToolAccessLevel.ELEVATED: frozenset({
ToolCategory.DATABASE,
ToolCategory.DEPLOYMENT,
# Add new high-privilege categories here
}),
# Option 2: Assertion test that catches new categories
assert frozenset(ToolCategory) == _LEVEL_CATEGORIES[ToolAccessLevel.ELEVATED]
```
This ensures new sensitive categories (e.g., SECRETS_MANAGER) are reviewed before automatic elevation.
How can I resolve this? If you propose a fix, please make it concise.| def denial_reason(self, tool_name: str, category: ToolCategory) -> str: | ||
| """Return a human-readable reason why a tool would be denied. | ||
|
|
||
| Intended for use after confirming the tool is denied via | ||
| ``is_permitted`` or via ``check``. If the tool is actually | ||
| permitted, the returned string does not apply and should not | ||
| be shown to users. | ||
|
|
||
| Args: | ||
| tool_name: Name of the tool. | ||
| category: Category of the tool. | ||
|
|
||
| Returns: | ||
| Explanation string suitable for error messages. | ||
| """ | ||
| name_lower = tool_name.strip().casefold() | ||
| if name_lower in self._denied: | ||
| return f"Tool {tool_name!r} is explicitly denied" | ||
| if self._access_level == ToolAccessLevel.CUSTOM: | ||
| return ( | ||
| f"Tool {tool_name!r} is not in the allowed list (access level: custom)" | ||
| ) | ||
| return ( | ||
| f"Category {category.value!r} is not permitted " | ||
| f"at access level {self._access_level.value!r}" | ||
| ) |
There was a problem hiding this comment.
The denial_reason method can produce misleading output if called directly on a tool in the allowed list. The docstring documents this as "Intended for use after confirming the tool is denied," but there is no guard to enforce it. If a caller forgets the precondition, the third branch returns an incorrect category-based message.
Consider adding a defensive check:
def denial_reason(self, tool_name: str, category: ToolCategory) -> str:
name_lower = tool_name.strip().casefold()
if name_lower in self._denied:
return f"Tool {tool_name!r} is explicitly denied"
if name_lower in self._allowed:
# Guard: tool is actually permitted, should not reach here
return f"Tool {tool_name!r} is explicitly allowed"
if self._access_level == ToolAccessLevel.CUSTOM:
return f"Tool {tool_name!r} is not in the allowed list (access level: custom)"
return (
f"Category {category.value!r} is not permitted "
f"at access level {self._access_level.value!r}"
)This makes the method robust against misuse and removes the implicit precondition.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/permissions.py
Line: 173-198
Comment:
The `denial_reason` method can produce misleading output if called directly on a tool in the allowed list. The docstring documents this as "Intended for use after confirming the tool is denied," but there is no guard to enforce it. If a caller forgets the precondition, the third branch returns an incorrect category-based message.
Consider adding a defensive check:
```python
def denial_reason(self, tool_name: str, category: ToolCategory) -> str:
name_lower = tool_name.strip().casefold()
if name_lower in self._denied:
return f"Tool {tool_name!r} is explicitly denied"
if name_lower in self._allowed:
# Guard: tool is actually permitted, should not reach here
return f"Tool {tool_name!r} is explicitly allowed"
if self._access_level == ToolAccessLevel.CUSTOM:
return f"Tool {tool_name!r} is not in the allowed list (access level: custom)"
return (
f"Category {category.value!r} is not permitted "
f"at access level {self._access_level.value!r}"
)
```
This makes the method robust against misuse and removes the implicit precondition.
How can I resolve this? If you propose a fix, please make it concise.
Summary
ToolPermissionCheckerenforces access-level gating with priority-based resolution: denied list → allowed list → level categories → denyBaseTool.categoryattribute (ToolCategoryenum) enables per-tool categorization for permission checksToolInvokerintegrates permission checking: filters tool definitions for LLM prompt + checks at invocation time (defense-in-depth)AgentEnginecreates permission-aware invokers fromAgentIdentity.toolsat the start of eachrun()callToolPermissionDeniedErrorin the tool error hierarchyToolAccessLevelandToolCategoryenums added tocore.enumsToolPermissions.access_levelfield added to the agent identity cardPre-PR Review Fixes (10 agents, 18 findings addressed)
ToolCategoryon test toolsfilter_definitionstests (denied list, sort order)invoke()docstring to include permission-check stepToolAccessLeveldocstring re: CUSTOM level semanticsfilter_definitionssort explicit + added DEBUG log for filtered-out tools.get()to direct dict access for fail-loud on unmapped access levelsdenial_reason/_check_permissiondocstringsformat_task_instructiontoprompt.py(agent_engine.py under 800 lines)Closes #16
Test plan
Review coverage
Pre-reviewed by 10 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency. 27 findings triaged, 18 implemented, 9 skipped (by design / future scope / too minor).