Skip to content

feat: implement subprocess sandbox for tool execution isolation (#131)#153

Merged
Aureliolo merged 3 commits intomainfrom
feat/subprocess-sandbox
Mar 7, 2026
Merged

feat: implement subprocess sandbox for tool execution isolation (#131)#153
Aureliolo merged 3 commits intomainfrom
feat/subprocess-sandbox

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Subprocess sandbox backend (SubprocessSandbox) implementing the SandboxBackend protocol for tool execution isolation
  • Environment filtering: allowlist + denylist with fnmatch patterns, library injection var blocking (LD_PRELOAD, PYTHONPATH, etc.), case-insensitive matching on Windows
  • PATH restriction: filters to safe system directories with secure fallback (never exposes unfiltered PATH)
  • Workspace boundary enforcement: rejects commands with cwd outside the workspace
  • Timeout + process-group kill: kills entire process group on Unix to prevent orphaned grandchild processes
  • Git tool integration: all 6 git tools (GitStatusTool, GitLogTool, GitDiffTool, GitBranchTool, GitCommitTool, GitCloneTool) accept optional SandboxBackend via _BaseGitTool, with git hardening env vars passed as env_overrides
  • Security hardening: secret-substring filtering in direct git subprocess path, MappingProxyType for hardening overrides, http:// removed from allowed clone schemes
  • Correctness fixes: timeout=0.0 no longer silently uses default, returncode=0 no longer mapped to -1, proper raise from exc chaining
  • DESIGN_SPEC.md updates: fixed sandbox filenames, added missing files, updated implementation status, added sandbox.py to events listing

Files

New

  • src/ai_company/tools/sandbox/__init__.py, config.py, errors.py, protocol.py, result.py, subprocess_sandbox.py
  • src/ai_company/observability/events/sandbox.py — 11 SANDBOX_* event constants
  • tests/unit/tools/sandbox/ — config, errors, protocol, result, subprocess_sandbox tests
  • tests/unit/tools/git/test_git_sandbox_integration.py — sandbox integration for 5 git tools
  • tests/integration/tools/test_sandbox_integration.py — real git + sandbox e2e
  • tests/integration/tools/conftest.py — shared git_repo fixture

Modified

  • src/ai_company/tools/_git_base.py — sandbox integration, secret filtering, MappingProxyType
  • src/ai_company/tools/git_tools.py — remove http:// from clone schemes
  • DESIGN_SPEC.md — §11.1.2 and §15.3 updates

Test plan

  • All 2387 tests pass (uv run pytest tests/ -n auto)
  • Coverage: 96.43% (threshold: 80%)
  • mypy: 0 errors across 275 files
  • ruff: all checks pass, formatting clean
  • Pre-commit hooks: all pass

Review coverage

Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 21 findings triaged and addressed.

Closes #131

Aureliolo and others added 2 commits March 7, 2026 12:28
Add a pluggable SandboxBackend protocol with a subprocess-based
implementation that provides environment filtering, workspace boundary
enforcement, timeout management, and PATH restriction for git tools.

- SandboxBackend protocol (runtime_checkable) with execute/cleanup/health_check
- SubprocessSandbox implementation with env allowlist/denylist, PATH filtering
- SandboxResult frozen model with computed success field
- SubprocessSandboxConfig with timeout, workspace_only, restricted_path options
- SandboxError hierarchy inheriting from ToolError
- Git tools integration: optional sandbox injection, dual code path in _run_git
- Sandbox event constants for observability
- 100% test coverage on sandbox module (unit + integration)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rage

Pre-reviewed by 10 agents, 21 findings addressed.

Security:
- Block library injection env vars (LD_PRELOAD, PYTHONPATH, etc.) in denylist
- PATH fallback uses safe dirs instead of unfiltered original PATH
- Remove http:// from allowed clone schemes (cleartext credential risk)
- Add secret-substring filtering in direct git subprocess path
- Case-insensitive env var matching on Windows

Correctness:
- Fix timeout=0.0 silently using default (use `is not None` check)
- Fix returncode=0 mapped to -1 (explicit None check)
- Add process-group kill on Unix to prevent orphaned grandchildren
- Use `raise from exc` instead of `from None` for proper chaining

Quality:
- Extract shared _init_repo fixture, remove test code duplication
- Add sandbox tests for GitDiffTool, GitBranchTool, GitCommitTool
- Add denylist test for library injection entries
- Use MappingProxyType for _GIT_HARDENING_OVERRIDES
- Update DESIGN_SPEC.md: fix sandbox filenames, add missing files,
  update implementation status, add sandbox.py to events listing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 11:54
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new subprocess-based sandboxing mechanism to enhance the security and isolation of external tool executions. By implementing a SandboxBackend protocol, it provides a controlled environment for running commands, particularly for Git operations, with features like environment variable filtering, PATH restriction, workspace boundary enforcement, and robust timeout handling. This change significantly improves the reliability and security posture of tool interactions within the system.

Highlights

  • Subprocess Sandbox Backend: Implemented a SubprocessSandbox backend that adheres to the SandboxBackend protocol, providing isolation for tool execution.
  • Environment Filtering: Introduced robust environment variable filtering using allowlists and denylists with fnmatch patterns, blocking library injection variables like LD_PRELOAD and PYTHONPATH.
  • PATH Restriction: Implemented filtering of PATH entries to include only known safe system directories, with a secure fallback mechanism.
  • Workspace Boundary Enforcement: Added enforcement to reject commands attempting to execute with a current working directory (cwd) outside the defined workspace.
  • Timeout and Process-Group Kill: Enhanced process management to kill entire process groups on Unix systems upon timeout, preventing orphaned grandchild processes.
  • Git Tool Integration: Integrated the new SandboxBackend into all six Git tools (GitStatusTool, GitLogTool, GitDiffTool, GitBranchTool, GitCommitTool, GitCloneTool), allowing them to leverage sandboxing with hardened environment variables.
  • Security Hardening: Improved security by filtering secret substrings from direct Git subprocess paths, using MappingProxyType for hardening overrides, and removing http:// from allowed Git clone schemes.
  • Correctness Fixes: Addressed several correctness issues, including proper handling of timeout=0.0, returncode=0 mapping, and raise from exc chaining.
  • Documentation Updates: Updated DESIGN_SPEC.md to reflect the new sandbox filenames, added missing files, updated implementation status, and included sandbox.py in the events listing.
Changelog
  • DESIGN_SPEC.md
    • Updated the status of SubprocessSandbox to 'Implemented' and refined its description.
    • Added sandbox.py to the observability events listing and updated file structure details.
  • src/ai_company/observability/events/sandbox.py
    • Added new constants for sandbox-related events.
  • src/ai_company/tools/init.py
    • Exported new sandbox-related classes and protocols.
  • src/ai_company/tools/_git_base.py
    • Integrated an optional SandboxBackend for Git tools.
    • Added environment hardening and refactored subprocess execution to support sandboxing.
    • Introduced secret-substring filtering for direct git subprocess paths and used MappingProxyType for hardening overrides.
  • src/ai_company/tools/git_tools.py
    • Modified Git tool constructors to accept an optional SandboxBackend.
    • Removed http:// from allowed Git clone schemes.
  • src/ai_company/tools/sandbox/init.py
    • Added package exports for the new sandbox module.
  • src/ai_company/tools/sandbox/config.py
    • Defined the SubprocessSandboxConfig model for sandbox configuration.
  • src/ai_company/tools/sandbox/errors.py
    • Defined a hierarchy of custom exceptions for sandbox-related errors.
  • src/ai_company/tools/sandbox/protocol.py
    • Defined the SandboxBackend protocol for pluggable sandbox implementations.
  • src/ai_company/tools/sandbox/result.py
    • Defined the SandboxResult model to encapsulate the outcome of sandboxed command executions.
  • src/ai_company/tools/sandbox/subprocess_sandbox.py
    • Implemented the SubprocessSandbox backend for executing commands with environment filtering, workspace enforcement, and timeout management.
  • tests/integration/tools/init.py
    • Added an __init__.py file to mark the directory as a Python package.
  • tests/integration/tools/conftest.py
    • Added a git_repo fixture for integration tests involving Git.
  • tests/integration/tools/test_sandbox_integration.py
    • Added integration tests for SubprocessSandbox with real Git commands, including workspace escape and timeout scenarios.
  • tests/unit/observability/test_events.py
    • Updated the test to include the new sandbox event module in discovery.
  • tests/unit/tools/git/test_git_sandbox_integration.py
    • Added unit tests for Git tools to verify their integration with the new sandbox backend, and backward compatibility without it.
  • tests/unit/tools/sandbox/init.py
    • Added an __init__.py file to mark the directory as a Python package.
  • tests/unit/tools/sandbox/conftest.py
    • Added pytest fixtures for sandbox configuration and SubprocessSandbox instances.
  • tests/unit/tools/sandbox/test_config.py
    • Added unit tests for the SubprocessSandboxConfig model, covering defaults, validation, and immutability.
  • tests/unit/tools/sandbox/test_errors.py
    • Added unit tests for the sandbox error hierarchy, verifying inheritance and context handling.
  • tests/unit/tools/sandbox/test_protocol.py
    • Added unit tests to verify that SandboxBackend is a runtime-checkable protocol and is satisfied by implementations.
  • tests/unit/tools/sandbox/test_result.py
    • Added unit tests for the SandboxResult model, covering success/failure conditions and immutability.
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
    • Added comprehensive unit tests for the SubprocessSandbox implementation, covering environment filtering, workspace boundaries, and command execution.
Activity
  • Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency).
  • 21 findings triaged and addressed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 7, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8fccc901-0cc6-4a0c-9651-cd7e3f9f3aff

📥 Commits

Reviewing files that changed from the base of the PR and between 158578b and b7fc1ce.

📒 Files selected for processing (11)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/observability/events/sandbox.py
  • src/ai_company/tools/_git_base.py
  • src/ai_company/tools/git_tools.py
  • src/ai_company/tools/sandbox/subprocess_sandbox.py
  • tests/integration/tools/conftest.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/git/test_git_sandbox_integration.py
  • tests/unit/tools/sandbox/test_result.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Built-in sandboxing for subprocess tools with workspace isolation, env filtering, PATH restrictions, and timeouts.
    • Git tools can optionally run via a sandbox backend for safer git operations.
  • Security

    • Hardened git environment variables and stricter clone URL validation (HTTPS, SSH, git, SCP-like; plain HTTP rejected).
  • Tests

    • Added integration and unit tests validating sandbox behavior, timeouts, workspace enforcement, and git integration.

Walkthrough

Adds a pluggable SandboxBackend protocol, a SubprocessSandbox implementation with env/path/workspace controls and timeouts, integrates sandbox execution into git tools (optional injection), introduces sandbox observability events, and adds unit and integration tests exercising sandbox behavior and git tooling.

Changes

Cohort / File(s) Summary
Sandbox core & models
src/ai_company/tools/sandbox/protocol.py, src/ai_company/tools/sandbox/result.py, src/ai_company/tools/sandbox/errors.py, src/ai_company/tools/sandbox/config.py, src/ai_company/tools/sandbox/__init__.py
Adds SandboxBackend protocol, SandboxResult model, SandboxError hierarchy, SubprocessSandboxConfig Pydantic model, and package exports to define the sandbox public API.
Subprocess sandbox implementation
src/ai_company/tools/sandbox/subprocess_sandbox.py
Implements SubprocessSandbox: env allowlist/denylist, PATH restriction, workspace-scoped cwd validation, timed execution with process-group termination, health_check/cleanup, and observability logging.
Git tooling integration
src/ai_company/tools/_git_base.py, src/ai_company/tools/git_tools.py
Extends _BaseGitTool and git tool constructors to accept an optional SandboxBackend; routes git execution to sandbox when present, adds git env hardening overrides, secret-strip logic, sandbox result translation, and tightened clone URL scheme validation (removed http://).
Top-level tools exports
src/ai_company/tools/__init__.py
Re-exports sandbox types and classes (SandboxBackend, SandboxError variants, SandboxResult, SubprocessSandbox, SubprocessSandboxConfig) from tools package for public access.
Observability events
src/ai_company/observability/events/sandbox.py
Adds SANDBOX_* event constants (execute start/success/failed/timeout, spawn failed, env filtered, workspace violation, cleanup, path fallback, health_check, kill failed).
Docs / design spec
DESIGN_SPEC.md, CLAUDE.md
Updates design documentation to mark SubprocessSandbox implemented and to reflect sandbox-aware git tooling and planned Docker/K8s backends.
Tests — unit & integration
tests/unit/tools/sandbox/*, tests/unit/tools/git/test_git_sandbox_integration.py, tests/unit/observability/test_events.py, tests/integration/tools/*
Adds extensive unit tests for sandbox models, protocol, config, errors, SubprocessSandbox behavior; unit and integration tests exercising git tools with and without sandbox, workspace-escape and timeout cases, and integration fixtures/helpers.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant GitTool as GitStatusTool
    participant Sandbox as SubprocessSandbox
    participant OS as OS Process

    Client->>GitTool: request status(workspace, sandbox?)
    activate GitTool
    GitTool->>GitTool: _validate_path()
    alt sandbox provided
        GitTool->>Sandbox: execute(command="git", args=["status"...], cwd, env_overrides, timeout)
        activate Sandbox
        Sandbox->>Sandbox: _build_filtered_env()
        Note right of Sandbox: apply allowlist, denylist, PATH restriction
        Sandbox->>Sandbox: _validate_cwd(cwd) 
        Sandbox->>OS: create_subprocess_exec(command,args) (with timeout/process group)
        activate OS
        OS-->>Sandbox: stdout, stderr, returncode (or timeout)
        deactivate OS
        Sandbox->>GitTool: SandboxResult(returncode, stdout, stderr, timed_out)
        deactivate Sandbox
    else no sandbox
        GitTool->>OS: subprocess.run(command,args,env)
        activate OS
        OS-->>GitTool: CompletedProcess
        deactivate OS
    end
    GitTool->>GitTool: translate result -> ToolExecutionResult
    GitTool-->>Client: return ToolExecutionResult
    deactivate GitTool
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main feature: implementing subprocess sandbox for tool execution isolation, which is the core of the changeset.
Description check ✅ Passed Description is comprehensive and directly related to the changeset, detailing the subprocess sandbox implementation, features, integration, and test results.
Linked Issues check ✅ Passed All major coding requirements from issue #131 are met: SandboxBackend protocol defined [protocol.py], subprocess sandbox backend with environment filtering and workspace boundary enforcement [subprocess_sandbox.py, config.py], integration with git tools [_git_base.py, git_tools.py], event constants [sandbox.py], and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes align with issue #131 requirements: sandbox infrastructure, git tool integration, event constants, and supporting tests. Minor changes like http:// removal from clone schemes and DESIGN_SPEC updates are directly related to the sandbox implementation scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/subprocess-sandbox
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/subprocess-sandbox

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust subprocess sandbox for tool execution isolation. While the implementation shows strong attention to security details and includes comprehensive tests, a high-severity vulnerability was identified in the PATH filtering logic of the sandbox. The current _filter_path implementation can be bypassed using path traversal or sibling directory naming, potentially allowing the execution of untrusted binaries from the host system. Addressing this critical issue by properly resolving and validating path entries is essential. Additionally, there are two minor suggestions to improve code clarity and maintainability in the new modules.

Comment on lines +125 to +129
filtered = [
e
for e in entries
if any(e.lower().startswith(prefix.lower()) for prefix in safe_prefixes)
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The _filter_path method is vulnerable to a PATH restriction bypass. It uses a simple string prefix check (startswith) on raw path entries without resolving them first. This allows an attacker to bypass the restriction using path traversal (e.g., /usr/bin/../../tmp/evil) or sibling directories (e.g., /usr/bin-extra). This defeats the purpose of the sandbox's PATH restriction security control.

To remediate this, resolve each path entry using Path(e).resolve() before checking it against the safe prefixes, and ensure the check verifies that the entry is actually a subpath of the prefix.

Comment on lines +103 to +106
if check_pattern == check_name:
return True
if fnmatch.fnmatch(check_name, check_pattern):
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explicit equality check check_pattern == check_name is redundant because fnmatch.fnmatch() already handles exact matches. You can simplify this logic to a single fnmatch call.

Suggested change
if check_pattern == check_name:
return True
if fnmatch.fnmatch(check_name, check_pattern):
return True
if fnmatch.fnmatch(check_name, check_pattern):
return True

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR introduces a well-designed subprocess sandbox backend (SubprocessSandbox) for tool execution isolation, wires it into all six git tools via _BaseGitTool, and ships a comprehensive test suite (unit + integration). The security layering — environment allowlist/denylist, library-injection var blocking, PATH restriction with prefix-boundary checking, workspace enforcement, and process-group kill — is solid and appropriately documented.

Key findings:

  • Error message bug in GitCloneTool: adjacent string literals produce "...git://and SCP-like..." (missing space) in the URL rejection message.
  • Behavioural asymmetry on timeout: the sandbox path captures partial stdout/stderr after a kill and surfaces it via SandboxResult; the direct subprocess path (no sandbox) drains the process but discards the output, returning only a generic timeout string. Not a correctness bug, but makes the non-sandbox path harder to debug on long-running git operations (e.g. git clone).
  • _get_safe_path_prefixes on Windows hardcodes C:\Program Files\Git paths unconditionally; these are not governed by SubprocessSandboxConfig, making them non-configurable for callers that need a stricter allowlist.
  • All previously identified issues from the pre-review thread (double proc.kill(), duplicate SANDBOX_EXECUTE_TIMEOUT events) have been correctly addressed.

Confidence Score: 4/5

  • Safe to merge after fixing the clone URL error message; all other findings are non-blocking style improvements.
  • The implementation is architecturally sound with thorough test coverage (96% coverage, 2387 passing tests). The only functional defect is a cosmetic string-concatenation bug in an error message. The timeout-output asymmetry and non-configurable Windows PATH entries are low-severity style issues that don't affect correctness or security.
  • src/ai_company/tools/git_tools.py (error message fix) and src/ai_company/tools/sandbox/subprocess_sandbox.py (Windows PATH prefix configurability).

Important Files Changed

Filename Overview
src/ai_company/tools/sandbox/subprocess_sandbox.py Core sandbox implementation. Environment filtering, workspace validation, process-group kill, and timeout handling are all well-structured. One minor point: hardcoded Windows Git paths in _get_safe_path_prefixes are not configurable. The previously flagged double-kill race is fixed and the duplicate TIMEOUT event is resolved by using SANDBOX_KILL_FAILED for the drain timeout.
src/ai_company/tools/_git_base.py Good sandbox integration. MappingProxyType, secret-substring stripping, and raise from exc chaining are all correct. Minor behavioural asymmetry: direct-path discards partial output after kill while sandbox path captures it; covered by comment above.
src/ai_company/tools/git_tools.py All 6 git tools correctly accept the optional SandboxBackend. http:// correctly removed from allowed clone schemes. One string-concatenation bug in the clone URL rejection message (missing space before "and").
src/ai_company/tools/sandbox/config.py Frozen Pydantic model with sensible defaults. Denylist covers known library-injection vars (LD_PRELOAD, PYTHONPATH, DYLD_INSERT_LIBRARIES, etc.). timeout_seconds constrained to (0, 600].
src/ai_company/tools/sandbox/protocol.py @runtime_checkable Protocol with correct # noqa: TC003 annotation for Path. Clean definition; no issues found.
src/ai_company/tools/sandbox/result.py Frozen Pydantic result model using @computed_field for success. success correctly requires returncode == 0 AND not timed_out.
src/ai_company/tools/sandbox/errors.py Minimal, correct error hierarchy. SandboxTimeoutError is correctly documented as reserved for future backends; SubprocessSandbox signals timeouts via SandboxResult.timed_out.
src/ai_company/observability/events/sandbox.py 11 event constants covering the full lifecycle. SANDBOX_KILL_FAILED was correctly added to disambiguate the kill-drain timeout from the normal execution timeout.
tests/unit/tools/sandbox/test_subprocess_sandbox.py Comprehensive unit test coverage: constructor validation, env filtering, PATH restriction, workspace boundary, zero-timeout, process kill, health-check, and cleanup. All edge cases from the PR description are covered.
tests/integration/tools/test_sandbox_integration.py Good E2E integration coverage: real git repo + sandbox + GitStatusTool, workspace escape, and timeout on slow command.
tests/unit/tools/git/test_git_sandbox_integration.py Unit tests for all 5 (non-clone) git tools exercising the sandboxed code path using a mock sandbox backend. No issues.

Sequence Diagram

sequenceDiagram
    participant GT as GitTool
    participant BG as _BaseGitTool._run_git
    participant SB as SubprocessSandbox.execute
    participant DP as Direct Subprocess

    GT->>BG: _run_git(args, cwd, deadline)
    BG->>BG: _validate_cwd / _validate_path

    alt sandbox injected
        BG->>SB: execute(command="git", args, cwd, env_overrides, timeout)
        SB->>SB: _validate_cwd (workspace boundary)
        SB->>SB: _build_filtered_env (allowlist + denylist + PATH filter)
        SB->>SB: _spawn_process (start_new_session=True on Unix)
        SB->>SB: _communicate_with_timeout(deadline)
        alt timeout
            SB->>SB: _kill_process (killpg → proc.kill fallback)
            SB->>SB: _drain_after_kill (5s wait)
            SB-->>BG: SandboxResult(timed_out=True, stdout, stderr)
        else success / failure
            SB-->>BG: SandboxResult(returncode, stdout, stderr)
        end
        BG->>BG: _sandbox_result_to_execution_result
    else no sandbox
        BG->>DP: _run_git_direct(args, work_dir, deadline)
        DP->>DP: _build_git_env (inherit + hardening + strip secrets)
        DP->>DP: _start_git_process
        DP->>DP: _await_git_process(deadline)
        alt timeout
            DP->>DP: proc.kill + drain (5s, output discarded)
            DP-->>BG: ToolExecutionResult(is_error=True, "timed out")
        else done
            DP->>DP: _process_git_output
            DP-->>BG: ToolExecutionResult
        end
    end

    BG-->>GT: ToolExecutionResult
Loading

Comments Outside Diff (1)

  1. src/ai_company/tools/_git_base.py, line 308-324 (link)

    Partial output discarded on timeout in the direct path

    When a git process times out, the direct path (_await_git_process) drains the process after kill but silently discards the output from the second communicate() call:

    try:
        await asyncio.wait_for(proc.communicate(), timeout=5.0)
    except TimeoutError:
        logger.warning(GIT_COMMAND_FAILED, ...)
    logger.warning(GIT_COMMAND_TIMEOUT, ...)
    return ToolExecutionResult(content=f"Git command timed out after {deadline}s", ...)

    The sandbox path (_communicate_with_timeout + _drain_after_kill) captures this same output and propagates stdout/stderr through SandboxResult, then _sandbox_result_to_execution_result uses result.stderr as the content. This means a timed-out git clone via sandbox might surface a partial error message from git (e.g. a DNS failure), while the same operation without a sandbox only surfaces the generic "timed out" string.

    The divergence isn't a correctness bug, but it makes the non-sandboxed path harder to debug. Consider storing the result of the drain communicate() and including stderr in the returned ToolExecutionResult.content, mirroring the sandbox path's behaviour.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/tools/_git_base.py
    Line: 308-324
    
    Comment:
    **Partial output discarded on timeout in the direct path**
    
    When a git process times out, the direct path (`_await_git_process`) drains the process after kill but silently discards the output from the second `communicate()` call:
    
    ```python
    try:
        await asyncio.wait_for(proc.communicate(), timeout=5.0)
    except TimeoutError:
        logger.warning(GIT_COMMAND_FAILED, ...)
    logger.warning(GIT_COMMAND_TIMEOUT, ...)
    return ToolExecutionResult(content=f"Git command timed out after {deadline}s", ...)
    ```
    
    The sandbox path (`_communicate_with_timeout` + `_drain_after_kill`) captures this same output and propagates `stdout`/`stderr` through `SandboxResult`, then `_sandbox_result_to_execution_result` uses `result.stderr` as the `content`. This means a timed-out `git clone` via sandbox might surface a partial error message from git (e.g. a DNS failure), while the same operation without a sandbox only surfaces the generic "timed out" string.
    
    The divergence isn't a correctness bug, but it makes the non-sandboxed path harder to debug. Consider storing the result of the drain `communicate()` and including stderr in the returned `ToolExecutionResult.content`, mirroring the sandbox path's behaviour.
    
    How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: b7fc1ce

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/observability/test_events.py (1)

80-98: ⚠️ Potential issue | 🟡 Minor

Add exact-value assertions for the new sandbox events.

Line 92 updates discovery only. This file still lacks a test_sandbox_events_exist() equivalent, so a typo in any SANDBOX_* value would still pass the generic string/pattern/duplicate checks and only break observability consumers at runtime.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/observability/test_events.py` around lines 80 - 98, Add a new unit
test named test_sandbox_events_exist that imports the events.sandbox module and
asserts exact string values for each SANDBOX_* constant (e.g., SANDBOX_FOO,
SANDBOX_BAR — replace with the actual constant names present in events.sandbox)
rather than just checking discovery; for each constant (SANDBOX_...) assert
equality against the expected literal event name string so any typo or
accidental change fails the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1690-1691: The spec is inconsistent with the table: update all
stale references so SubprocessSandbox is consistently shown as implemented and
git handling reflects the current code (remove or revise the statement that "git
clone accepts `http://`" and instead document the actual allowed URL schemes and
sandbox behavior), and change the "sandbox backends for git" section that still
lists them as planned (around the paragraph referencing DockerSandbox and
subprocess backends) to describe the current implemented SubprocessSandbox
behavior and limitations; search for and edit occurrences of
"SubprocessSandbox", "DockerSandbox", "git clone accepts `http://`", and the
"sandbox backends for git" paragraph to align wording, allowed URL schemes, and
status with the table.

In `@src/ai_company/tools/_git_base.py`:
- Around line 451-461: The except block that currently catches all Exceptions
and returns a ToolExecutionResult should be changed to only handle
sandbox-related errors (the SandboxError family); in the function/method
containing this block replace the generic "except Exception as exc" with an
explicit except for SandboxError (or the specific sandbox base exception types
used in this module), log and convert those to ToolExecutionResult as done now
(using GIT_COMMAND_FAILED, _sanitize_command, logger.error and the existing
content), but for any other unexpected exceptions re-raise them after logging
(do not return a ToolExecutionResult) so programmer bugs and fatal errors (e.g.,
MemoryError, RecursionError) bubble up instead of being swallowed.

In `@src/ai_company/tools/git_tools.py`:
- Around line 27-31: The docstring and user-facing validation in GitCloneTool
are out of sync with _ALLOWED_CLONE_SCHEMES (which no longer allows "http://");
update the GitCloneTool class docstring and the validation error message to list
the current allowed schemes ("https://", "ssh://", "git://") instead of
including "http://", and preferably generate the message from the
_ALLOWED_CLONE_SCHEMES tuple so the text stays consistent with that constant.

In `@src/ai_company/tools/sandbox/protocol.py`:
- Around line 8-11: The TypeError arises because Mapping and SandboxResult are
only imported inside the TYPE_CHECKING block but are used at runtime in the
SandboxBackend protocol (see SandboxBackend and its env_overrides: Mapping[str,
str] | None annotation); move the imports of Mapping (from collections.abc) and
SandboxResult (from ai_company.tools.sandbox.result) out of the TYPE_CHECKING
block to top-level imports so those names exist when the class/protocol is
defined (alternatively, keep them in TYPE_CHECKING and make the annotations
stringified or enable from __future__ import annotations, but the requested fix
is to relocate the imports for Mapping and SandboxResult out of the
TYPE_CHECKING guard).

In `@src/ai_company/tools/sandbox/subprocess_sandbox.py`:
- Line 231: Remove the stale mypy suppression on the os.killpg call: delete the
trailing " # type: ignore[attr-defined]" from the line containing
os.killpg(os.getpgid(proc.pid), signal.SIGKILL) in subprocess_sandbox.py so mypy
no longer reports an unused type-ignore; ensure imports (os, signal) remain
intact and run type checking to confirm no new errors.
- Around line 238-354: The execute() method is too large; split it into small
helpers to isolate validation, env building, spawning, timeout handling, and
result decoding—for example create private methods
_prepare_work_dir_and_env(work_dir, env_overrides) that calls _validate_cwd and
_build_filtered_env, _spawn_process(command, args, work_dir, env) that wraps
asyncio.create_subprocess_exec and raises SandboxStartError on OSError,
_communicate_with_timeout(proc, effective_timeout) that implements the
asyncio.wait_for/proc.communicate timeout/killing logic and uses _kill_process,
and _finalize_result(proc, stdout_bytes, stderr_bytes) that decodes bytes, logs
using SANDBOX_EXECUTE_FAILED/SANDBOX_EXECUTE_SUCCESS, and returns a
SandboxResult; then refactor execute() to call these helpers so each function
stays under ~50 lines and security-sensitive steps are easier to audit.
- Around line 74-80: Before raising the ValueError in the workspace validation
checks inside the constructor (e.g., SubprocessSandbox.__init__), log the
failure at WARNING or ERROR level with context; specifically call the module or
instance logger (logger.error or self.logger.error) with a message that includes
the problematic workspace value when workspace.is_absolute() is false and
include the resolved path when resolved.is_dir() is false, then raise the
existing ValueError as before.
- Around line 271-277: Logger calls in subprocess_sandbox are currently emitting
raw args (e.g. the SANDBOX_EXECUTE_START debug that logs command and args),
which can leak credentials; update all sandbox logger statements (the
SANDBOX_EXECUTE_START, SANDBOX_EXECUTE_RESULT, SANDBOX_EXECUTE_ERROR usages
around the shown locations) to avoid logging raw args by either removing args
from the structured payload or replacing them with a sanitized/redacted version
(use or add a small helper like redact_sensitive_args/sanitize_args and call it
before logging), and ensure cwd and timeout remain safe to log while args are
never logged raw.
- Around line 123-138: The PATH filtering currently uses startswith and can be
spoofed (e.g., "/usr/bin-malicious"); update the logic in the function using
_get_safe_path_prefixes (the block that computes filtered from path_value and
safe_prefixes) to perform a robust boundary check: normalize/resolve each entry
and each safe_prefix (handling Windows case-insensitivity), then accept an entry
only if it is equal to a safe_prefix or its path components begin with the
safe_prefix as a distinct directory boundary (e.g., entry == prefix or
entry.startswith(prefix + _PATH_SEP) after normalization), and keep the existing
fallback that logs SANDBOX_PATH_FALLBACK and returns only actual directories
from safe_prefixes. Ensure you reference variables path_value, safe_prefixes,
filtered, and SANDBOX_PATH_FALLBACK when making the change.
- Around line 302-329: The timeout handler currently kills the child process via
_kill_process(proc) and calls await proc.communicate() but throws away the
returned stdout/stderr; update the except TimeoutError block to capture the
bytes returned by the second communicate() call (e.g., stdout_bytes,
stderr_bytes = await asyncio.wait_for(proc.communicate(), timeout=5.0)), decode
them to strings (or keep bytes if SandboxResult expects bytes) and include those
partial outputs in the returned SandboxResult instead of empty strings; also
keep existing logger.warning calls (SANDBOX_EXECUTE_TIMEOUT, command, args,
etc.) and ensure the final returned SandboxResult (constructed in the except
block) uses the captured stdout/stderr and still sets returncode=-1 and
timed_out=True.

In `@tests/integration/tools/conftest.py`:
- Around line 9-18: The _GIT_ENV dict currently copies os.environ which
preserves ambient GIT_* settings; make the git fixture hermetic by either
starting from an empty dict or by filtering out existing GIT_* keys before
applying test defaults. Update the module-level _GIT_ENV in
tests/integration/tools/conftest.py so it does not inherit GIT_DIR,
GIT_WORK_TREE, GIT_INDEX_FILE, GIT_CONFIG_GLOBAL, etc. (e.g., create a new dict
and then set the explicit GIT_* entries used by tests, or copy os.environ but
remove keys that start with "GIT_"), leaving the rest of the test defaults
unchanged.

In `@tests/unit/tools/git/test_git_sandbox_integration.py`:
- Around line 22-61: Add a new async test method (e.g., test_clone_with_sandbox)
that mirrors the other sandbox tests but exercises GitCloneTool: create a
SubprocessSandbox, instantiate GitCloneTool(workspace=<some path>,
sandbox=sandbox), call tool.execute with arguments containing a safe file:// URL
derived from the existing git_repo fixture (or git_repo.as_uri()) and a target
path/name, and assert the result is not an error (and optionally that the target
clone directory exists). This ensures the GitCloneTool (and its URL validation
and _CLONE_TIMEOUT behavior) is exercised under a sandboxed SubprocessSandbox.

In `@tests/unit/tools/sandbox/test_result.py`:
- Around line 14-46: Combine the four separate tests into one parametrized
pytest function that covers the same cases: create a single test function (e.g.,
test_sandboxresult_success_matrix) using `@pytest.mark.parametrize` to pass
different tuples of (stdout, stderr, returncode, timed_out, expected_success),
instantiate SandboxResult with those parameters, and assert result.success ==
expected_success; reference the existing SandboxResult class and its success
property to build each case: ( "ok","",0,False,True ),
("","error",1,False,False), ("","timeout",0,True,False), ("","", -1,True,False
).

In `@tests/unit/tools/sandbox/test_subprocess_sandbox.py`:
- Around line 238-255: Add a regression test that exercises timeout=0.0
(distinct from None) by creating a new async test (e.g.,
test_zero_timeout_kills_process) using the same pattern as
test_timeout_kills_process: call SubprocessSandbox.execute with timeout=0.0,
branch on os.name to use the same Windows ("cmd", "/c", "ping", ...) and POSIX
("sleep", "10") commands, and assert result.timed_out is True and result.success
is False; this ensures the sandbox logic treats 0.0 as an explicit immediate
timeout rather than falling back to config.timeout_seconds.

---

Outside diff comments:
In `@tests/unit/observability/test_events.py`:
- Around line 80-98: Add a new unit test named test_sandbox_events_exist that
imports the events.sandbox module and asserts exact string values for each
SANDBOX_* constant (e.g., SANDBOX_FOO, SANDBOX_BAR — replace with the actual
constant names present in events.sandbox) rather than just checking discovery;
for each constant (SANDBOX_...) assert equality against the expected literal
event name string so any typo or accidental change fails the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2e6243be-d570-44c3-aed7-7f6ad02e23fc

📥 Commits

Reviewing files that changed from the base of the PR and between c57a6a9 and 158578b.

📒 Files selected for processing (23)
  • DESIGN_SPEC.md
  • src/ai_company/observability/events/sandbox.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/_git_base.py
  • src/ai_company/tools/git_tools.py
  • src/ai_company/tools/sandbox/__init__.py
  • src/ai_company/tools/sandbox/config.py
  • src/ai_company/tools/sandbox/errors.py
  • src/ai_company/tools/sandbox/protocol.py
  • src/ai_company/tools/sandbox/result.py
  • src/ai_company/tools/sandbox/subprocess_sandbox.py
  • tests/integration/tools/__init__.py
  • tests/integration/tools/conftest.py
  • tests/integration/tools/test_sandbox_integration.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/git/test_git_sandbox_integration.py
  • tests/unit/tools/sandbox/__init__.py
  • tests/unit/tools/sandbox/conftest.py
  • tests/unit/tools/sandbox/test_config.py
  • tests/unit/tools/sandbox/test_errors.py
  • tests/unit/tools/sandbox/test_protocol.py
  • tests/unit/tools/sandbox/test_result.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations in Python code — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — ruff enforces PEP 758 except syntax on Python 3.14.
All public functions and classes must have type hints. Use mypy strict mode.
Use Google style docstrings, required on all public classes and functions (enforced by ruff D rules).
Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task. Existing code is being migrated incrementally.
Keep functions under 50 lines and files under 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries (user input, external APIs, config files).
Set line length to 88 characters (ruff).

Files:

  • tests/integration/tools/__init__.py
  • tests/unit/tools/sandbox/conftest.py
  • src/ai_company/tools/sandbox/result.py
  • tests/unit/tools/sandbox/test_config.py
  • tests/unit/observability/test_events.py
  • src/ai_company/observability/events/sandbox.py
  • tests/integration/tools/conftest.py
  • tests/integration/tools/test_sandbox_integration.py
  • src/ai_company/tools/sandbox/__init__.py
  • src/ai_company/tools/sandbox/subprocess_sandbox.py
  • tests/unit/tools/sandbox/__init__.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/git_tools.py
  • tests/unit/tools/sandbox/test_result.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/tools/sandbox/test_protocol.py
  • src/ai_company/tools/sandbox/config.py
  • src/ai_company/tools/sandbox/errors.py
  • tests/unit/tools/sandbox/test_errors.py
  • src/ai_company/tools/sandbox/protocol.py
  • src/ai_company/tools/_git_base.py
  • tests/unit/tools/git/test_git_sandbox_integration.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Maintain 80% minimum code coverage (enforced in CI).
Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/integration/tools/__init__.py
  • tests/unit/tools/sandbox/conftest.py
  • tests/unit/tools/sandbox/test_config.py
  • tests/unit/observability/test_events.py
  • tests/integration/tools/conftest.py
  • tests/integration/tools/test_sandbox_integration.py
  • tests/unit/tools/sandbox/__init__.py
  • tests/unit/tools/sandbox/test_result.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/tools/sandbox/test_protocol.py
  • tests/unit/tools/sandbox/test_errors.py
  • tests/unit/tools/git/test_git_sandbox_integration.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name for the logger (not _logger, not log).
Use event name constants from the domain-specific module under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Always use structured logging kwargs: logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/tools/sandbox/result.py
  • src/ai_company/observability/events/sandbox.py
  • src/ai_company/tools/sandbox/__init__.py
  • src/ai_company/tools/sandbox/subprocess_sandbox.py
  • src/ai_company/tools/__init__.py
  • src/ai_company/tools/git_tools.py
  • src/ai_company/tools/sandbox/config.py
  • src/ai_company/tools/sandbox/errors.py
  • src/ai_company/tools/sandbox/protocol.py
  • src/ai_company/tools/_git_base.py
🧠 Learnings (1)
📚 Learning: 2026-03-07T10:57:52.980Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T10:57:52.980Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/ai_company/observability/events/sandbox.py
🧬 Code graph analysis (12)
tests/unit/tools/sandbox/test_config.py (2)
src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
  • config (85-87)
src/ai_company/tools/sandbox/config.py (1)
  • SubprocessSandboxConfig (6-60)
tests/integration/tools/conftest.py (3)
src/ai_company/tools/_git_base.py (1)
  • _run_git (388-420)
src/ai_company/engine/agent_engine.py (1)
  • run (109-192)
src/ai_company/tools/permissions.py (1)
  • check (150-171)
src/ai_company/tools/sandbox/__init__.py (5)
src/ai_company/tools/sandbox/subprocess_sandbox.py (2)
  • config (85-87)
  • SubprocessSandbox (48-373)
src/ai_company/tools/sandbox/config.py (1)
  • SubprocessSandboxConfig (6-60)
src/ai_company/tools/sandbox/errors.py (3)
  • SandboxError (10-11)
  • SandboxStartError (25-26)
  • SandboxTimeoutError (14-22)
src/ai_company/tools/sandbox/protocol.py (1)
  • SandboxBackend (15-74)
src/ai_company/tools/sandbox/result.py (1)
  • SandboxResult (6-28)
src/ai_company/tools/__init__.py (5)
src/ai_company/tools/sandbox/protocol.py (1)
  • SandboxBackend (15-74)
src/ai_company/tools/sandbox/errors.py (2)
  • SandboxError (10-11)
  • SandboxStartError (25-26)
src/ai_company/tools/sandbox/result.py (1)
  • SandboxResult (6-28)
src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
  • SubprocessSandbox (48-373)
src/ai_company/tools/sandbox/config.py (1)
  • SubprocessSandboxConfig (6-60)
src/ai_company/tools/git_tools.py (3)
src/ai_company/tools/sandbox/protocol.py (1)
  • SandboxBackend (15-74)
src/ai_company/tools/_git_base.py (1)
  • workspace (127-129)
src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
  • workspace (90-92)
tests/unit/tools/sandbox/test_result.py (2)
src/ai_company/tools/sandbox/result.py (2)
  • SandboxResult (6-28)
  • success (26-28)
tests/unit/tools/sandbox/test_config.py (1)
  • test_frozen (46-49)
tests/unit/tools/sandbox/test_subprocess_sandbox.py (3)
src/ai_company/tools/sandbox/subprocess_sandbox.py (9)
  • config (85-87)
  • SubprocessSandbox (48-373)
  • workspace (90-92)
  • _build_filtered_env (153-196)
  • _validate_cwd (198-219)
  • execute (238-354)
  • health_check (360-369)
  • cleanup (356-358)
  • get_backend_type (371-373)
src/ai_company/tools/sandbox/config.py (1)
  • SubprocessSandboxConfig (6-60)
src/ai_company/tools/sandbox/errors.py (2)
  • SandboxError (10-11)
  • SandboxStartError (25-26)
tests/unit/tools/sandbox/test_protocol.py (4)
src/ai_company/tools/sandbox/protocol.py (5)
  • SandboxBackend (15-74)
  • execute (24-50)
  • cleanup (52-58)
  • health_check (60-66)
  • get_backend_type (68-74)
src/ai_company/tools/sandbox/result.py (1)
  • SandboxResult (6-28)
tests/unit/tools/sandbox/conftest.py (1)
  • subprocess_sandbox (24-32)
src/ai_company/tools/sandbox/subprocess_sandbox.py (5)
  • SubprocessSandbox (48-373)
  • execute (238-354)
  • cleanup (356-358)
  • health_check (360-369)
  • get_backend_type (371-373)
src/ai_company/tools/sandbox/errors.py (1)
src/ai_company/tools/errors.py (1)
  • ToolError (12-44)
tests/unit/tools/sandbox/test_errors.py (2)
src/ai_company/tools/errors.py (1)
  • ToolError (12-44)
src/ai_company/tools/sandbox/errors.py (3)
  • SandboxError (10-11)
  • SandboxStartError (25-26)
  • SandboxTimeoutError (14-22)
src/ai_company/tools/sandbox/protocol.py (3)
src/ai_company/tools/sandbox/result.py (1)
  • SandboxResult (6-28)
src/ai_company/tools/sandbox/subprocess_sandbox.py (4)
  • execute (238-354)
  • cleanup (356-358)
  • health_check (360-369)
  • get_backend_type (371-373)
tests/unit/tools/sandbox/test_protocol.py (4)
  • execute (18-31)
  • cleanup (33-34)
  • health_check (36-37)
  • get_backend_type (39-40)
src/ai_company/tools/_git_base.py (4)
src/ai_company/tools/sandbox/errors.py (1)
  • SandboxError (10-11)
src/ai_company/tools/sandbox/protocol.py (2)
  • SandboxBackend (15-74)
  • execute (24-50)
src/ai_company/tools/sandbox/result.py (1)
  • SandboxResult (6-28)
src/ai_company/tools/sandbox/subprocess_sandbox.py (2)
  • workspace (90-92)
  • execute (238-354)
🪛 GitHub Actions: CI
src/ai_company/tools/sandbox/subprocess_sandbox.py

[error] 231-231: mypy: Unused "type: ignore" comment [unused-ignore].

🔇 Additional comments (3)
src/ai_company/tools/sandbox/errors.py (1)

10-26: Clean error layering.

Keeping sandbox exceptions rooted in ToolError gives callers one consistent error path and preserves the shared immutable context behavior.

src/ai_company/tools/sandbox/result.py (1)

6-28: Nice use of a frozen computed result model.

Deriving success from returncode and timed_out avoids redundant state and keeps the execution result immutable.

src/ai_company/observability/events/sandbox.py (1)

5-14: Good addition of a dedicated sandbox event domain.

Centralizing the SANDBOX_* names here keeps call sites aligned with the observability import pattern. Based on learnings, "Use event name constants from the domain-specific module under ai_company.observability.events."

Comment on lines +451 to +461
except Exception as exc:
logger.error(
GIT_COMMAND_FAILED,
command=_sanitize_command(["git", *args]),
error=f"Unexpected sandbox error: {exc}",
exc_info=True,
)
return ToolExecutionResult(
content=f"Sandbox error: {exc}",
is_error=True,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn unexpected sandbox failures into normal tool errors.

Lines 451-460 catch every Exception from the sandbox and convert it into ToolExecutionResult. That masks programmer bugs and suppresses non-recoverable failures like MemoryError and RecursionError instead of letting them abort the run. Only SandboxError-family failures should be translated here; unexpected exceptions should be logged and re-raised.

Suggested fix
-        except Exception as exc:
-            logger.error(
-                GIT_COMMAND_FAILED,
-                command=_sanitize_command(["git", *args]),
-                error=f"Unexpected sandbox error: {exc}",
-                exc_info=True,
-            )
-            return ToolExecutionResult(
-                content=f"Sandbox error: {exc}",
-                is_error=True,
-            )
+        except (MemoryError, RecursionError):
+            raise
+        except Exception as exc:
+            logger.error(
+                GIT_COMMAND_FAILED,
+                command=_sanitize_command(["git", *args]),
+                error=f"Unexpected sandbox error: {exc}",
+                exc_info=True,
+            )
+            raise
As per coding guidelines, "Handle errors explicitly, never silently swallow exceptions."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as exc:
logger.error(
GIT_COMMAND_FAILED,
command=_sanitize_command(["git", *args]),
error=f"Unexpected sandbox error: {exc}",
exc_info=True,
)
return ToolExecutionResult(
content=f"Sandbox error: {exc}",
is_error=True,
)
except (MemoryError, RecursionError):
raise
except Exception as exc:
logger.error(
GIT_COMMAND_FAILED,
command=_sanitize_command(["git", *args]),
error=f"Unexpected sandbox error: {exc}",
exc_info=True,
)
raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/_git_base.py` around lines 451 - 461, The except block
that currently catches all Exceptions and returns a ToolExecutionResult should
be changed to only handle sandbox-related errors (the SandboxError family); in
the function/method containing this block replace the generic "except Exception
as exc" with an explicit except for SandboxError (or the specific sandbox base
exception types used in this module), log and convert those to
ToolExecutionResult as done now (using GIT_COMMAND_FAILED, _sanitize_command,
logger.error and the existing content), but for any other unexpected exceptions
re-raise them after logging (do not return a ToolExecutionResult) so programmer
bugs and fatal errors (e.g., MemoryError, RecursionError) bubble up instead of
being swallowed.

Comment on lines +8 to +11
if TYPE_CHECKING:
from collections.abc import Mapping

from ai_company.tools.sandbox.result import SandboxResult
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
from typing import get_type_hints
import importlib.util

path = Path("src/ai_company/tools/sandbox/protocol.py")
spec = importlib.util.spec_from_file_location("sandbox_protocol", path)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(module)

print(get_type_hints(module.SandboxBackend.execute))
PY

Repository: Aureliolo/ai-company

Length of output: 633


Move Mapping and SandboxResult imports outside TYPE_CHECKING block.

The module fails to import with NameError: name 'Mapping' is not defined. These types are used in runtime annotations on the public SandboxBackend protocol (line 30: env_overrides: Mapping[str, str] | None), but are only defined under TYPE_CHECKING. Since the codebase does not use from __future__ import annotations, these annotations are evaluated at class definition time, causing an immediate import failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/sandbox/protocol.py` around lines 8 - 11, The TypeError
arises because Mapping and SandboxResult are only imported inside the
TYPE_CHECKING block but are used at runtime in the SandboxBackend protocol (see
SandboxBackend and its env_overrides: Mapping[str, str] | None annotation); move
the imports of Mapping (from collections.abc) and SandboxResult (from
ai_company.tools.sandbox.result) out of the TYPE_CHECKING block to top-level
imports so those names exist when the class/protocol is defined (alternatively,
keep them in TYPE_CHECKING and make the annotations stringified or enable from
__future__ import annotations, but the requested fix is to relocate the imports
for Mapping and SandboxResult out of the TYPE_CHECKING guard).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a subprocess-based sandbox backend for isolating tool execution, adds environment/PATH/workspace restrictions, and wires optional sandbox execution into git tools while expanding observability and test coverage.

Changes:

  • Added SubprocessSandbox backend with env filtering, restricted PATH, workspace boundary checks, and timeout + process-group kill handling.
  • Integrated optional SandboxBackend into git tool execution path and tightened clone scheme allowlist.
  • Added sandbox observability events and comprehensive unit/integration tests; updated design spec accordingly.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/ai_company/tools/sandbox/subprocess_sandbox.py New subprocess sandbox implementation (env filtering, PATH restriction, workspace enforcement, timeout handling, logging).
src/ai_company/tools/sandbox/config.py New frozen config model for sandbox defaults and validation.
src/ai_company/tools/sandbox/errors.py New sandbox error hierarchy rooted in ToolError.
src/ai_company/tools/sandbox/protocol.py New SandboxBackend protocol (execute/cleanup/health_check/type).
src/ai_company/tools/sandbox/result.py New frozen SandboxResult model with computed success.
src/ai_company/tools/sandbox/__init__.py Package exports for sandbox public API.
src/ai_company/tools/_git_base.py Adds sandbox-aware execution path, hardening env overrides, and extra defense-in-depth env filtering for direct subprocess.
src/ai_company/tools/git_tools.py Adds optional sandbox injection to all git tools; removes http:// from allowed clone schemes.
src/ai_company/tools/__init__.py Re-exports sandbox types from the top-level tools package.
src/ai_company/observability/events/sandbox.py Adds SANDBOX_* event constants.
tests/unit/tools/sandbox/conftest.py Adds fixtures for sandbox unit tests.
tests/unit/tools/sandbox/test_subprocess_sandbox.py Unit tests for sandbox constructor, env filtering, cwd enforcement, execute, timeout, and health/cleanup.
tests/unit/tools/sandbox/test_config.py Unit tests for config defaults, validation, and immutability.
tests/unit/tools/sandbox/test_errors.py Unit tests for sandbox error hierarchy and context immutability.
tests/unit/tools/sandbox/test_protocol.py Unit tests verifying runtime-checkable protocol behavior.
tests/unit/tools/sandbox/test_result.py Unit tests for SandboxResult semantics and frozen behavior.
tests/unit/tools/sandbox/__init__.py Marks sandbox unit test package.
tests/unit/tools/git/test_git_sandbox_integration.py Unit-level integration tests ensuring git tools work with/without sandbox and surface sandbox failures.
tests/unit/observability/test_events.py Includes sandbox in the discovered observability domains list.
tests/integration/tools/conftest.py Adds integration fixture for creating a real git repo for sandbox e2e tests.
tests/integration/tools/test_sandbox_integration.py E2E tests running real git commands via sandbox and validating workspace/timeout behavior.
tests/integration/tools/__init__.py Marks integration tools test package.
DESIGN_SPEC.md Updates sandbox backend status/details and directory/event listings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +123 to +129
safe_prefixes = self._get_safe_path_prefixes()
entries = path_value.split(_PATH_SEP)
filtered = [
e
for e in entries
if any(e.lower().startswith(prefix.lower()) for prefix in safe_prefixes)
]
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_filter_path() currently treats a PATH entry as safe if it merely starts with a safe prefix. This can be bypassed with traversal components (e.g. /usr/bin/../../home/user/bin) that still start with /usr/bin but resolve outside the intended safe directories. Consider normalizing each entry (e.g., Path(entry).resolve() when possible) and then checking it is equal to or contained within a safe directory, or at least rejecting entries containing .. segments.

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +115
"""Check if an env var name matches any denylist pattern."""
upper = name.upper()
return any(
fnmatch.fnmatch(upper, pat) for pat in self._config.env_denylist_patterns
)

Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_matches_denylist() uppercases the env var name but compares it against patterns without normalizing the pattern case. This makes denylist behavior depend on whether callers provide uppercase patterns (works with defaults) and can fail for lowercase custom patterns. Consider uppercasing (or casefold()-ing) the patterns once (e.g., in config validation or at comparison time) for consistent case-insensitive matching.

Suggested change
"""Check if an env var name matches any denylist pattern."""
upper = name.upper()
return any(
fnmatch.fnmatch(upper, pat) for pat in self._config.env_denylist_patterns
)
"""Check if an env var name matches any denylist pattern.
Uses case-insensitive matching on Windows where env var names
are case-insensitive, mirroring allowlist behavior.
"""
check_name = name.upper() if os.name == "nt" else name
for pattern in self._config.env_denylist_patterns:
check_pattern = pattern.upper() if os.name == "nt" else pattern
if fnmatch.fnmatch(check_name, check_pattern):
return True
return False

Copilot uses AI. Check for mistakes.
logger.warning(
GIT_COMMAND_TIMEOUT,
command=_sanitize_command(["git", *args]),
deadline="sandbox",
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sandboxed timeout path, GIT_COMMAND_TIMEOUT is logged with deadline="sandbox", whereas the direct path logs a numeric deadline (seconds). Mixing types for the same field can break downstream parsing/metrics. Consider logging the actual numeric timeout used (available from the sandbox call) and/or using a separate field (e.g. backend="sandbox") to indicate the source.

Suggested change
deadline="sandbox",
backend="sandbox",

Copilot uses AI. Check for mistakes.
…, Greptile, and Copilot

- Harden PATH filtering with normpath + os.sep boundary check (prefix spoofing fix)
- Fix case-insensitive denylist matching (uppercase both name and pattern)
- Add credential redaction in logged args via _redact_args helper
- Split execute() into _spawn_process, _communicate_with_timeout, _drain_after_kill
- Fix _kill_process double-kill + add ProcessLookupError handling
- Add SANDBOX_KILL_FAILED event constant for unkillable processes
- Remove broad except Exception from _run_git_sandboxed
- Add _sandbox_result_to_execution_result deadline parameter
- Make _MAX_COUNT_LIMIT Final, update GitCloneTool docstring
- Update DESIGN_SPEC.md §11.1.1, §11.2, §15.5 for sandbox status
- Update CLAUDE.md package structure with sandbox
- Make git test fixtures hermetic (GIT_CONFIG_GLOBAL=os.devnull)
- Add tests: env_overrides bypass, prefix spoofing, path fallback, zero timeout
- Add parametrized SandboxResult success matrix test
- Add sandbox events existence test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 3c8394e into main Mar 7, 2026
5 of 7 checks passed
@Aureliolo Aureliolo deleted the feat/subprocess-sandbox branch March 7, 2026 12:23
Comment on lines +194 to +202
if os.name == "nt":
system_root = os.environ.get("SYSTEMROOT", r"C:\WINDOWS")
return (
system_root,
str(Path(system_root) / "system32"),
r"C:\Program Files\Git",
r"C:\Program Files (x86)\Git",
)
return ("/usr/bin", "/usr/local/bin", "/bin", "/usr/sbin", "/sbin")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Windows Git paths bypass caller-controlled PATH restriction

_get_safe_path_prefixes unconditionally includes r"C:\Program Files\Git" and r"C:\Program Files (x86)\Git" as safe path prefixes regardless of env_allowlist or env_denylist_patterns. A caller who instantiates the sandbox with an explicit SubprocessSandboxConfig cannot opt out of these entries because the prefix list is hard-wired in a @staticmethod. If the intent is to restrict git to a particular installation (or to deny git entirely), those entries would need to be removed manually.

This is unlikely to cause issues for the current use case (git tools always need git on PATH), but the tight coupling between "safe PATH prefixes" and the sandbox config is worth documenting — or the prefix list could be made a config field so callers can override it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/sandbox/subprocess_sandbox.py
Line: 194-202

Comment:
**Hardcoded Windows Git paths bypass caller-controlled PATH restriction**

`_get_safe_path_prefixes` unconditionally includes `r"C:\Program Files\Git"` and `r"C:\Program Files (x86)\Git"` as safe path prefixes regardless of `env_allowlist` or `env_denylist_patterns`. A caller who instantiates the sandbox with an explicit `SubprocessSandboxConfig` cannot opt out of these entries because the prefix list is hard-wired in a `@staticmethod`. If the intent is to restrict git to a particular installation (or to deny git entirely), those entries would need to be removed manually.

This is unlikely to cause issues for the current use case (git tools always need git on `PATH`), but the tight coupling between "safe PATH prefixes" and the sandbox config is worth documenting — or the prefix list could be made a config field so callers can override it.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +690 to 692
f"Invalid clone URL. Only {schemes}"
"and SCP-like (user@host:path) URLs are "
"allowed"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space in error message string concatenation

The two adjacent string literals are concatenated without a space. The resulting message reads "Invalid clone URL. Only https://, ssh://, git://and SCP-like…" — the last scheme runs directly into the word "and" with no separator.

Suggested change
f"Invalid clone URL. Only {schemes}"
"and SCP-like (user@host:path) URLs are "
"allowed"
content=(
f"Invalid clone URL. Only {schemes} "
"and SCP-like (user at host:path) URLs are "
"allowed"
),
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/git_tools.py
Line: 690-692

Comment:
**Missing space in error message string concatenation**

The two adjacent string literals are concatenated without a space. The resulting message reads `"Invalid clone URL. Only https://, ssh://, git://and SCP-like…"` — the last scheme runs directly into the word `"and"` with no separator.

```suggestion
            content=(
                f"Invalid clone URL. Only {schemes} "
                "and SCP-like (user at host:path) URLs are "
                "allowed"
            ),
```

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement subprocess sandbox for file and git tools (DESIGN_SPEC §11.1.2)

2 participants