feat: implement subprocess sandbox for tool execution isolation (#131)#153
feat: implement subprocess sandbox for tool execution isolation (#131)#153
Conversation
Add a pluggable SandboxBackend protocol with a subprocess-based implementation that provides environment filtering, workspace boundary enforcement, timeout management, and PATH restriction for git tools. - SandboxBackend protocol (runtime_checkable) with execute/cleanup/health_check - SubprocessSandbox implementation with env allowlist/denylist, PATH filtering - SandboxResult frozen model with computed success field - SubprocessSandboxConfig with timeout, workspace_only, restricted_path options - SandboxError hierarchy inheriting from ToolError - Git tools integration: optional sandbox injection, dual code path in _run_git - Sandbox event constants for observability - 100% test coverage on sandbox module (unit + integration) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rage Pre-reviewed by 10 agents, 21 findings addressed. Security: - Block library injection env vars (LD_PRELOAD, PYTHONPATH, etc.) in denylist - PATH fallback uses safe dirs instead of unfiltered original PATH - Remove http:// from allowed clone schemes (cleartext credential risk) - Add secret-substring filtering in direct git subprocess path - Case-insensitive env var matching on Windows Correctness: - Fix timeout=0.0 silently using default (use `is not None` check) - Fix returncode=0 mapped to -1 (explicit None check) - Add process-group kill on Unix to prevent orphaned grandchildren - Use `raise from exc` instead of `from None` for proper chaining Quality: - Extract shared _init_repo fixture, remove test code duplication - Add sandbox tests for GitDiffTool, GitBranchTool, GitCommitTool - Add denylist test for library injection entries - Use MappingProxyType for _GIT_HARDENING_OVERRIDES - Update DESIGN_SPEC.md: fix sandbox filenames, add missing files, update implementation status, add sandbox.py to events listing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new subprocess-based sandboxing mechanism to enhance the security and isolation of external tool executions. By implementing a Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a pluggable SandboxBackend protocol, a SubprocessSandbox implementation with env/path/workspace controls and timeouts, integrates sandbox execution into git tools (optional injection), introduces sandbox observability events, and adds unit and integration tests exercising sandbox behavior and git tooling. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant GitTool as GitStatusTool
participant Sandbox as SubprocessSandbox
participant OS as OS Process
Client->>GitTool: request status(workspace, sandbox?)
activate GitTool
GitTool->>GitTool: _validate_path()
alt sandbox provided
GitTool->>Sandbox: execute(command="git", args=["status"...], cwd, env_overrides, timeout)
activate Sandbox
Sandbox->>Sandbox: _build_filtered_env()
Note right of Sandbox: apply allowlist, denylist, PATH restriction
Sandbox->>Sandbox: _validate_cwd(cwd)
Sandbox->>OS: create_subprocess_exec(command,args) (with timeout/process group)
activate OS
OS-->>Sandbox: stdout, stderr, returncode (or timeout)
deactivate OS
Sandbox->>GitTool: SandboxResult(returncode, stdout, stderr, timed_out)
deactivate Sandbox
else no sandbox
GitTool->>OS: subprocess.run(command,args,env)
activate OS
OS-->>GitTool: CompletedProcess
deactivate OS
end
GitTool->>GitTool: translate result -> ToolExecutionResult
GitTool-->>Client: return ToolExecutionResult
deactivate GitTool
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a robust subprocess sandbox for tool execution isolation. While the implementation shows strong attention to security details and includes comprehensive tests, a high-severity vulnerability was identified in the PATH filtering logic of the sandbox. The current _filter_path implementation can be bypassed using path traversal or sibling directory naming, potentially allowing the execution of untrusted binaries from the host system. Addressing this critical issue by properly resolving and validating path entries is essential. Additionally, there are two minor suggestions to improve code clarity and maintainability in the new modules.
| filtered = [ | ||
| e | ||
| for e in entries | ||
| if any(e.lower().startswith(prefix.lower()) for prefix in safe_prefixes) | ||
| ] |
There was a problem hiding this comment.
The _filter_path method is vulnerable to a PATH restriction bypass. It uses a simple string prefix check (startswith) on raw path entries without resolving them first. This allows an attacker to bypass the restriction using path traversal (e.g., /usr/bin/../../tmp/evil) or sibling directories (e.g., /usr/bin-extra). This defeats the purpose of the sandbox's PATH restriction security control.
To remediate this, resolve each path entry using Path(e).resolve() before checking it against the safe prefixes, and ensure the check verifies that the entry is actually a subpath of the prefix.
| if check_pattern == check_name: | ||
| return True | ||
| if fnmatch.fnmatch(check_name, check_pattern): | ||
| return True |
There was a problem hiding this comment.
The explicit equality check check_pattern == check_name is redundant because fnmatch.fnmatch() already handles exact matches. You can simplify this logic to a single fnmatch call.
| if check_pattern == check_name: | |
| return True | |
| if fnmatch.fnmatch(check_name, check_pattern): | |
| return True | |
| if fnmatch.fnmatch(check_name, check_pattern): | |
| return True |
Greptile SummaryThis PR introduces a well-designed subprocess sandbox backend ( Key findings:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant GT as GitTool
participant BG as _BaseGitTool._run_git
participant SB as SubprocessSandbox.execute
participant DP as Direct Subprocess
GT->>BG: _run_git(args, cwd, deadline)
BG->>BG: _validate_cwd / _validate_path
alt sandbox injected
BG->>SB: execute(command="git", args, cwd, env_overrides, timeout)
SB->>SB: _validate_cwd (workspace boundary)
SB->>SB: _build_filtered_env (allowlist + denylist + PATH filter)
SB->>SB: _spawn_process (start_new_session=True on Unix)
SB->>SB: _communicate_with_timeout(deadline)
alt timeout
SB->>SB: _kill_process (killpg → proc.kill fallback)
SB->>SB: _drain_after_kill (5s wait)
SB-->>BG: SandboxResult(timed_out=True, stdout, stderr)
else success / failure
SB-->>BG: SandboxResult(returncode, stdout, stderr)
end
BG->>BG: _sandbox_result_to_execution_result
else no sandbox
BG->>DP: _run_git_direct(args, work_dir, deadline)
DP->>DP: _build_git_env (inherit + hardening + strip secrets)
DP->>DP: _start_git_process
DP->>DP: _await_git_process(deadline)
alt timeout
DP->>DP: proc.kill + drain (5s, output discarded)
DP-->>BG: ToolExecutionResult(is_error=True, "timed out")
else done
DP->>DP: _process_git_output
DP-->>BG: ToolExecutionResult
end
end
BG-->>GT: ToolExecutionResult
|
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/observability/test_events.py (1)
80-98:⚠️ Potential issue | 🟡 MinorAdd exact-value assertions for the new sandbox events.
Line 92 updates discovery only. This file still lacks a
test_sandbox_events_exist()equivalent, so a typo in anySANDBOX_*value would still pass the generic string/pattern/duplicate checks and only break observability consumers at runtime.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/observability/test_events.py` around lines 80 - 98, Add a new unit test named test_sandbox_events_exist that imports the events.sandbox module and asserts exact string values for each SANDBOX_* constant (e.g., SANDBOX_FOO, SANDBOX_BAR — replace with the actual constant names present in events.sandbox) rather than just checking discovery; for each constant (SANDBOX_...) assert equality against the expected literal event name string so any typo or accidental change fails the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1690-1691: The spec is inconsistent with the table: update all
stale references so SubprocessSandbox is consistently shown as implemented and
git handling reflects the current code (remove or revise the statement that "git
clone accepts `http://`" and instead document the actual allowed URL schemes and
sandbox behavior), and change the "sandbox backends for git" section that still
lists them as planned (around the paragraph referencing DockerSandbox and
subprocess backends) to describe the current implemented SubprocessSandbox
behavior and limitations; search for and edit occurrences of
"SubprocessSandbox", "DockerSandbox", "git clone accepts `http://`", and the
"sandbox backends for git" paragraph to align wording, allowed URL schemes, and
status with the table.
In `@src/ai_company/tools/_git_base.py`:
- Around line 451-461: The except block that currently catches all Exceptions
and returns a ToolExecutionResult should be changed to only handle
sandbox-related errors (the SandboxError family); in the function/method
containing this block replace the generic "except Exception as exc" with an
explicit except for SandboxError (or the specific sandbox base exception types
used in this module), log and convert those to ToolExecutionResult as done now
(using GIT_COMMAND_FAILED, _sanitize_command, logger.error and the existing
content), but for any other unexpected exceptions re-raise them after logging
(do not return a ToolExecutionResult) so programmer bugs and fatal errors (e.g.,
MemoryError, RecursionError) bubble up instead of being swallowed.
In `@src/ai_company/tools/git_tools.py`:
- Around line 27-31: The docstring and user-facing validation in GitCloneTool
are out of sync with _ALLOWED_CLONE_SCHEMES (which no longer allows "http://");
update the GitCloneTool class docstring and the validation error message to list
the current allowed schemes ("https://", "ssh://", "git://") instead of
including "http://", and preferably generate the message from the
_ALLOWED_CLONE_SCHEMES tuple so the text stays consistent with that constant.
In `@src/ai_company/tools/sandbox/protocol.py`:
- Around line 8-11: The TypeError arises because Mapping and SandboxResult are
only imported inside the TYPE_CHECKING block but are used at runtime in the
SandboxBackend protocol (see SandboxBackend and its env_overrides: Mapping[str,
str] | None annotation); move the imports of Mapping (from collections.abc) and
SandboxResult (from ai_company.tools.sandbox.result) out of the TYPE_CHECKING
block to top-level imports so those names exist when the class/protocol is
defined (alternatively, keep them in TYPE_CHECKING and make the annotations
stringified or enable from __future__ import annotations, but the requested fix
is to relocate the imports for Mapping and SandboxResult out of the
TYPE_CHECKING guard).
In `@src/ai_company/tools/sandbox/subprocess_sandbox.py`:
- Line 231: Remove the stale mypy suppression on the os.killpg call: delete the
trailing " # type: ignore[attr-defined]" from the line containing
os.killpg(os.getpgid(proc.pid), signal.SIGKILL) in subprocess_sandbox.py so mypy
no longer reports an unused type-ignore; ensure imports (os, signal) remain
intact and run type checking to confirm no new errors.
- Around line 238-354: The execute() method is too large; split it into small
helpers to isolate validation, env building, spawning, timeout handling, and
result decoding—for example create private methods
_prepare_work_dir_and_env(work_dir, env_overrides) that calls _validate_cwd and
_build_filtered_env, _spawn_process(command, args, work_dir, env) that wraps
asyncio.create_subprocess_exec and raises SandboxStartError on OSError,
_communicate_with_timeout(proc, effective_timeout) that implements the
asyncio.wait_for/proc.communicate timeout/killing logic and uses _kill_process,
and _finalize_result(proc, stdout_bytes, stderr_bytes) that decodes bytes, logs
using SANDBOX_EXECUTE_FAILED/SANDBOX_EXECUTE_SUCCESS, and returns a
SandboxResult; then refactor execute() to call these helpers so each function
stays under ~50 lines and security-sensitive steps are easier to audit.
- Around line 74-80: Before raising the ValueError in the workspace validation
checks inside the constructor (e.g., SubprocessSandbox.__init__), log the
failure at WARNING or ERROR level with context; specifically call the module or
instance logger (logger.error or self.logger.error) with a message that includes
the problematic workspace value when workspace.is_absolute() is false and
include the resolved path when resolved.is_dir() is false, then raise the
existing ValueError as before.
- Around line 271-277: Logger calls in subprocess_sandbox are currently emitting
raw args (e.g. the SANDBOX_EXECUTE_START debug that logs command and args),
which can leak credentials; update all sandbox logger statements (the
SANDBOX_EXECUTE_START, SANDBOX_EXECUTE_RESULT, SANDBOX_EXECUTE_ERROR usages
around the shown locations) to avoid logging raw args by either removing args
from the structured payload or replacing them with a sanitized/redacted version
(use or add a small helper like redact_sensitive_args/sanitize_args and call it
before logging), and ensure cwd and timeout remain safe to log while args are
never logged raw.
- Around line 123-138: The PATH filtering currently uses startswith and can be
spoofed (e.g., "/usr/bin-malicious"); update the logic in the function using
_get_safe_path_prefixes (the block that computes filtered from path_value and
safe_prefixes) to perform a robust boundary check: normalize/resolve each entry
and each safe_prefix (handling Windows case-insensitivity), then accept an entry
only if it is equal to a safe_prefix or its path components begin with the
safe_prefix as a distinct directory boundary (e.g., entry == prefix or
entry.startswith(prefix + _PATH_SEP) after normalization), and keep the existing
fallback that logs SANDBOX_PATH_FALLBACK and returns only actual directories
from safe_prefixes. Ensure you reference variables path_value, safe_prefixes,
filtered, and SANDBOX_PATH_FALLBACK when making the change.
- Around line 302-329: The timeout handler currently kills the child process via
_kill_process(proc) and calls await proc.communicate() but throws away the
returned stdout/stderr; update the except TimeoutError block to capture the
bytes returned by the second communicate() call (e.g., stdout_bytes,
stderr_bytes = await asyncio.wait_for(proc.communicate(), timeout=5.0)), decode
them to strings (or keep bytes if SandboxResult expects bytes) and include those
partial outputs in the returned SandboxResult instead of empty strings; also
keep existing logger.warning calls (SANDBOX_EXECUTE_TIMEOUT, command, args,
etc.) and ensure the final returned SandboxResult (constructed in the except
block) uses the captured stdout/stderr and still sets returncode=-1 and
timed_out=True.
In `@tests/integration/tools/conftest.py`:
- Around line 9-18: The _GIT_ENV dict currently copies os.environ which
preserves ambient GIT_* settings; make the git fixture hermetic by either
starting from an empty dict or by filtering out existing GIT_* keys before
applying test defaults. Update the module-level _GIT_ENV in
tests/integration/tools/conftest.py so it does not inherit GIT_DIR,
GIT_WORK_TREE, GIT_INDEX_FILE, GIT_CONFIG_GLOBAL, etc. (e.g., create a new dict
and then set the explicit GIT_* entries used by tests, or copy os.environ but
remove keys that start with "GIT_"), leaving the rest of the test defaults
unchanged.
In `@tests/unit/tools/git/test_git_sandbox_integration.py`:
- Around line 22-61: Add a new async test method (e.g., test_clone_with_sandbox)
that mirrors the other sandbox tests but exercises GitCloneTool: create a
SubprocessSandbox, instantiate GitCloneTool(workspace=<some path>,
sandbox=sandbox), call tool.execute with arguments containing a safe file:// URL
derived from the existing git_repo fixture (or git_repo.as_uri()) and a target
path/name, and assert the result is not an error (and optionally that the target
clone directory exists). This ensures the GitCloneTool (and its URL validation
and _CLONE_TIMEOUT behavior) is exercised under a sandboxed SubprocessSandbox.
In `@tests/unit/tools/sandbox/test_result.py`:
- Around line 14-46: Combine the four separate tests into one parametrized
pytest function that covers the same cases: create a single test function (e.g.,
test_sandboxresult_success_matrix) using `@pytest.mark.parametrize` to pass
different tuples of (stdout, stderr, returncode, timed_out, expected_success),
instantiate SandboxResult with those parameters, and assert result.success ==
expected_success; reference the existing SandboxResult class and its success
property to build each case: ( "ok","",0,False,True ),
("","error",1,False,False), ("","timeout",0,True,False), ("","", -1,True,False
).
In `@tests/unit/tools/sandbox/test_subprocess_sandbox.py`:
- Around line 238-255: Add a regression test that exercises timeout=0.0
(distinct from None) by creating a new async test (e.g.,
test_zero_timeout_kills_process) using the same pattern as
test_timeout_kills_process: call SubprocessSandbox.execute with timeout=0.0,
branch on os.name to use the same Windows ("cmd", "/c", "ping", ...) and POSIX
("sleep", "10") commands, and assert result.timed_out is True and result.success
is False; this ensures the sandbox logic treats 0.0 as an explicit immediate
timeout rather than falling back to config.timeout_seconds.
---
Outside diff comments:
In `@tests/unit/observability/test_events.py`:
- Around line 80-98: Add a new unit test named test_sandbox_events_exist that
imports the events.sandbox module and asserts exact string values for each
SANDBOX_* constant (e.g., SANDBOX_FOO, SANDBOX_BAR — replace with the actual
constant names present in events.sandbox) rather than just checking discovery;
for each constant (SANDBOX_...) assert equality against the expected literal
event name string so any typo or accidental change fails the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2e6243be-d570-44c3-aed7-7f6ad02e23fc
📒 Files selected for processing (23)
DESIGN_SPEC.mdsrc/ai_company/observability/events/sandbox.pysrc/ai_company/tools/__init__.pysrc/ai_company/tools/_git_base.pysrc/ai_company/tools/git_tools.pysrc/ai_company/tools/sandbox/__init__.pysrc/ai_company/tools/sandbox/config.pysrc/ai_company/tools/sandbox/errors.pysrc/ai_company/tools/sandbox/protocol.pysrc/ai_company/tools/sandbox/result.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pytests/integration/tools/__init__.pytests/integration/tools/conftest.pytests/integration/tools/test_sandbox_integration.pytests/unit/observability/test_events.pytests/unit/tools/git/test_git_sandbox_integration.pytests/unit/tools/sandbox/__init__.pytests/unit/tools/sandbox/conftest.pytests/unit/tools/sandbox/test_config.pytests/unit/tools/sandbox/test_errors.pytests/unit/tools/sandbox/test_protocol.pytests/unit/tools/sandbox/test_result.pytests/unit/tools/sandbox/test_subprocess_sandbox.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotationsin Python code — Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:syntax (no parentheses) — ruff enforces PEP 758 except syntax on Python 3.14.
All public functions and classes must have type hints. Use mypy strict mode.
Use Google style docstrings, required on all public classes and functions (enforced by ruff D rules).
Create new objects instead of mutating existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention and usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing redundant fields. UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task. Existing code is being migrated incrementally.
Keep functions under 50 lines and files under 800 lines.
Handle errors explicitly, never silently swallow exceptions.
Validate at system boundaries (user input, external APIs, config files).
Set line length to 88 characters (ruff).
Files:
tests/integration/tools/__init__.pytests/unit/tools/sandbox/conftest.pysrc/ai_company/tools/sandbox/result.pytests/unit/tools/sandbox/test_config.pytests/unit/observability/test_events.pysrc/ai_company/observability/events/sandbox.pytests/integration/tools/conftest.pytests/integration/tools/test_sandbox_integration.pysrc/ai_company/tools/sandbox/__init__.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pytests/unit/tools/sandbox/__init__.pysrc/ai_company/tools/__init__.pysrc/ai_company/tools/git_tools.pytests/unit/tools/sandbox/test_result.pytests/unit/tools/sandbox/test_subprocess_sandbox.pytests/unit/tools/sandbox/test_protocol.pysrc/ai_company/tools/sandbox/config.pysrc/ai_company/tools/sandbox/errors.pytests/unit/tools/sandbox/test_errors.pysrc/ai_company/tools/sandbox/protocol.pysrc/ai_company/tools/_git_base.pytests/unit/tools/git/test_git_sandbox_integration.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Maintain 80% minimum code coverage (enforced in CI).
Prefer@pytest.mark.parametrizefor testing similar cases.
Files:
tests/integration/tools/__init__.pytests/unit/tools/sandbox/conftest.pytests/unit/tools/sandbox/test_config.pytests/unit/observability/test_events.pytests/integration/tools/conftest.pytests/integration/tools/test_sandbox_integration.pytests/unit/tools/sandbox/__init__.pytests/unit/tools/sandbox/test_result.pytests/unit/tools/sandbox/test_subprocess_sandbox.pytests/unit/tools/sandbox/test_protocol.pytests/unit/tools/sandbox/test_errors.pytests/unit/tools/git/test_git_sandbox_integration.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have:from ai_company.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging/logging.getLogger()/print()in application code.
Always useloggeras the variable name for the logger (not_logger, notlog).
Use event name constants from the domain-specific module underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT.
Always use structured logging kwargs:logger.info(EVENT, key=value)— neverlogger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/skill/agent files, (3) third-party import paths/module names. Tests must usetest-provider,test-small-001, etc.
Files:
src/ai_company/tools/sandbox/result.pysrc/ai_company/observability/events/sandbox.pysrc/ai_company/tools/sandbox/__init__.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pysrc/ai_company/tools/__init__.pysrc/ai_company/tools/git_tools.pysrc/ai_company/tools/sandbox/config.pysrc/ai_company/tools/sandbox/errors.pysrc/ai_company/tools/sandbox/protocol.pysrc/ai_company/tools/_git_base.py
🧠 Learnings (1)
📚 Learning: 2026-03-07T10:57:52.980Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T10:57:52.980Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/ai_company/observability/events/sandbox.py
🧬 Code graph analysis (12)
tests/unit/tools/sandbox/test_config.py (2)
src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
config(85-87)src/ai_company/tools/sandbox/config.py (1)
SubprocessSandboxConfig(6-60)
tests/integration/tools/conftest.py (3)
src/ai_company/tools/_git_base.py (1)
_run_git(388-420)src/ai_company/engine/agent_engine.py (1)
run(109-192)src/ai_company/tools/permissions.py (1)
check(150-171)
src/ai_company/tools/sandbox/__init__.py (5)
src/ai_company/tools/sandbox/subprocess_sandbox.py (2)
config(85-87)SubprocessSandbox(48-373)src/ai_company/tools/sandbox/config.py (1)
SubprocessSandboxConfig(6-60)src/ai_company/tools/sandbox/errors.py (3)
SandboxError(10-11)SandboxStartError(25-26)SandboxTimeoutError(14-22)src/ai_company/tools/sandbox/protocol.py (1)
SandboxBackend(15-74)src/ai_company/tools/sandbox/result.py (1)
SandboxResult(6-28)
src/ai_company/tools/__init__.py (5)
src/ai_company/tools/sandbox/protocol.py (1)
SandboxBackend(15-74)src/ai_company/tools/sandbox/errors.py (2)
SandboxError(10-11)SandboxStartError(25-26)src/ai_company/tools/sandbox/result.py (1)
SandboxResult(6-28)src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
SubprocessSandbox(48-373)src/ai_company/tools/sandbox/config.py (1)
SubprocessSandboxConfig(6-60)
src/ai_company/tools/git_tools.py (3)
src/ai_company/tools/sandbox/protocol.py (1)
SandboxBackend(15-74)src/ai_company/tools/_git_base.py (1)
workspace(127-129)src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
workspace(90-92)
tests/unit/tools/sandbox/test_result.py (2)
src/ai_company/tools/sandbox/result.py (2)
SandboxResult(6-28)success(26-28)tests/unit/tools/sandbox/test_config.py (1)
test_frozen(46-49)
tests/unit/tools/sandbox/test_subprocess_sandbox.py (3)
src/ai_company/tools/sandbox/subprocess_sandbox.py (9)
config(85-87)SubprocessSandbox(48-373)workspace(90-92)_build_filtered_env(153-196)_validate_cwd(198-219)execute(238-354)health_check(360-369)cleanup(356-358)get_backend_type(371-373)src/ai_company/tools/sandbox/config.py (1)
SubprocessSandboxConfig(6-60)src/ai_company/tools/sandbox/errors.py (2)
SandboxError(10-11)SandboxStartError(25-26)
tests/unit/tools/sandbox/test_protocol.py (4)
src/ai_company/tools/sandbox/protocol.py (5)
SandboxBackend(15-74)execute(24-50)cleanup(52-58)health_check(60-66)get_backend_type(68-74)src/ai_company/tools/sandbox/result.py (1)
SandboxResult(6-28)tests/unit/tools/sandbox/conftest.py (1)
subprocess_sandbox(24-32)src/ai_company/tools/sandbox/subprocess_sandbox.py (5)
SubprocessSandbox(48-373)execute(238-354)cleanup(356-358)health_check(360-369)get_backend_type(371-373)
src/ai_company/tools/sandbox/errors.py (1)
src/ai_company/tools/errors.py (1)
ToolError(12-44)
tests/unit/tools/sandbox/test_errors.py (2)
src/ai_company/tools/errors.py (1)
ToolError(12-44)src/ai_company/tools/sandbox/errors.py (3)
SandboxError(10-11)SandboxStartError(25-26)SandboxTimeoutError(14-22)
src/ai_company/tools/sandbox/protocol.py (3)
src/ai_company/tools/sandbox/result.py (1)
SandboxResult(6-28)src/ai_company/tools/sandbox/subprocess_sandbox.py (4)
execute(238-354)cleanup(356-358)health_check(360-369)get_backend_type(371-373)tests/unit/tools/sandbox/test_protocol.py (4)
execute(18-31)cleanup(33-34)health_check(36-37)get_backend_type(39-40)
src/ai_company/tools/_git_base.py (4)
src/ai_company/tools/sandbox/errors.py (1)
SandboxError(10-11)src/ai_company/tools/sandbox/protocol.py (2)
SandboxBackend(15-74)execute(24-50)src/ai_company/tools/sandbox/result.py (1)
SandboxResult(6-28)src/ai_company/tools/sandbox/subprocess_sandbox.py (2)
workspace(90-92)execute(238-354)
🪛 GitHub Actions: CI
src/ai_company/tools/sandbox/subprocess_sandbox.py
[error] 231-231: mypy: Unused "type: ignore" comment [unused-ignore].
🔇 Additional comments (3)
src/ai_company/tools/sandbox/errors.py (1)
10-26: Clean error layering.Keeping sandbox exceptions rooted in
ToolErrorgives callers one consistent error path and preserves the shared immutable context behavior.src/ai_company/tools/sandbox/result.py (1)
6-28: Nice use of a frozen computed result model.Deriving
successfromreturncodeandtimed_outavoids redundant state and keeps the execution result immutable.src/ai_company/observability/events/sandbox.py (1)
5-14: Good addition of a dedicated sandbox event domain.Centralizing the
SANDBOX_*names here keeps call sites aligned with the observability import pattern. Based on learnings, "Use event name constants from the domain-specific module underai_company.observability.events."
src/ai_company/tools/_git_base.py
Outdated
| except Exception as exc: | ||
| logger.error( | ||
| GIT_COMMAND_FAILED, | ||
| command=_sanitize_command(["git", *args]), | ||
| error=f"Unexpected sandbox error: {exc}", | ||
| exc_info=True, | ||
| ) | ||
| return ToolExecutionResult( | ||
| content=f"Sandbox error: {exc}", | ||
| is_error=True, | ||
| ) |
There was a problem hiding this comment.
Don't turn unexpected sandbox failures into normal tool errors.
Lines 451-460 catch every Exception from the sandbox and convert it into ToolExecutionResult. That masks programmer bugs and suppresses non-recoverable failures like MemoryError and RecursionError instead of letting them abort the run. Only SandboxError-family failures should be translated here; unexpected exceptions should be logged and re-raised.
Suggested fix
- except Exception as exc:
- logger.error(
- GIT_COMMAND_FAILED,
- command=_sanitize_command(["git", *args]),
- error=f"Unexpected sandbox error: {exc}",
- exc_info=True,
- )
- return ToolExecutionResult(
- content=f"Sandbox error: {exc}",
- is_error=True,
- )
+ except (MemoryError, RecursionError):
+ raise
+ except Exception as exc:
+ logger.error(
+ GIT_COMMAND_FAILED,
+ command=_sanitize_command(["git", *args]),
+ error=f"Unexpected sandbox error: {exc}",
+ exc_info=True,
+ )
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as exc: | |
| logger.error( | |
| GIT_COMMAND_FAILED, | |
| command=_sanitize_command(["git", *args]), | |
| error=f"Unexpected sandbox error: {exc}", | |
| exc_info=True, | |
| ) | |
| return ToolExecutionResult( | |
| content=f"Sandbox error: {exc}", | |
| is_error=True, | |
| ) | |
| except (MemoryError, RecursionError): | |
| raise | |
| except Exception as exc: | |
| logger.error( | |
| GIT_COMMAND_FAILED, | |
| command=_sanitize_command(["git", *args]), | |
| error=f"Unexpected sandbox error: {exc}", | |
| exc_info=True, | |
| ) | |
| raise |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/_git_base.py` around lines 451 - 461, The except block
that currently catches all Exceptions and returns a ToolExecutionResult should
be changed to only handle sandbox-related errors (the SandboxError family); in
the function/method containing this block replace the generic "except Exception
as exc" with an explicit except for SandboxError (or the specific sandbox base
exception types used in this module), log and convert those to
ToolExecutionResult as done now (using GIT_COMMAND_FAILED, _sanitize_command,
logger.error and the existing content), but for any other unexpected exceptions
re-raise them after logging (do not return a ToolExecutionResult) so programmer
bugs and fatal errors (e.g., MemoryError, RecursionError) bubble up instead of
being swallowed.
| if TYPE_CHECKING: | ||
| from collections.abc import Mapping | ||
|
|
||
| from ai_company.tools.sandbox.result import SandboxResult |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
from typing import get_type_hints
import importlib.util
path = Path("src/ai_company/tools/sandbox/protocol.py")
spec = importlib.util.spec_from_file_location("sandbox_protocol", path)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(module)
print(get_type_hints(module.SandboxBackend.execute))
PYRepository: Aureliolo/ai-company
Length of output: 633
Move Mapping and SandboxResult imports outside TYPE_CHECKING block.
The module fails to import with NameError: name 'Mapping' is not defined. These types are used in runtime annotations on the public SandboxBackend protocol (line 30: env_overrides: Mapping[str, str] | None), but are only defined under TYPE_CHECKING. Since the codebase does not use from __future__ import annotations, these annotations are evaluated at class definition time, causing an immediate import failure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/sandbox/protocol.py` around lines 8 - 11, The TypeError
arises because Mapping and SandboxResult are only imported inside the
TYPE_CHECKING block but are used at runtime in the SandboxBackend protocol (see
SandboxBackend and its env_overrides: Mapping[str, str] | None annotation); move
the imports of Mapping (from collections.abc) and SandboxResult (from
ai_company.tools.sandbox.result) out of the TYPE_CHECKING block to top-level
imports so those names exist when the class/protocol is defined (alternatively,
keep them in TYPE_CHECKING and make the annotations stringified or enable from
__future__ import annotations, but the requested fix is to relocate the imports
for Mapping and SandboxResult out of the TYPE_CHECKING guard).
There was a problem hiding this comment.
Pull request overview
Implements a subprocess-based sandbox backend for isolating tool execution, adds environment/PATH/workspace restrictions, and wires optional sandbox execution into git tools while expanding observability and test coverage.
Changes:
- Added
SubprocessSandboxbackend with env filtering, restricted PATH, workspace boundary checks, and timeout + process-group kill handling. - Integrated optional
SandboxBackendinto git tool execution path and tightened clone scheme allowlist. - Added sandbox observability events and comprehensive unit/integration tests; updated design spec accordingly.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/tools/sandbox/subprocess_sandbox.py |
New subprocess sandbox implementation (env filtering, PATH restriction, workspace enforcement, timeout handling, logging). |
src/ai_company/tools/sandbox/config.py |
New frozen config model for sandbox defaults and validation. |
src/ai_company/tools/sandbox/errors.py |
New sandbox error hierarchy rooted in ToolError. |
src/ai_company/tools/sandbox/protocol.py |
New SandboxBackend protocol (execute/cleanup/health_check/type). |
src/ai_company/tools/sandbox/result.py |
New frozen SandboxResult model with computed success. |
src/ai_company/tools/sandbox/__init__.py |
Package exports for sandbox public API. |
src/ai_company/tools/_git_base.py |
Adds sandbox-aware execution path, hardening env overrides, and extra defense-in-depth env filtering for direct subprocess. |
src/ai_company/tools/git_tools.py |
Adds optional sandbox injection to all git tools; removes http:// from allowed clone schemes. |
src/ai_company/tools/__init__.py |
Re-exports sandbox types from the top-level tools package. |
src/ai_company/observability/events/sandbox.py |
Adds SANDBOX_* event constants. |
tests/unit/tools/sandbox/conftest.py |
Adds fixtures for sandbox unit tests. |
tests/unit/tools/sandbox/test_subprocess_sandbox.py |
Unit tests for sandbox constructor, env filtering, cwd enforcement, execute, timeout, and health/cleanup. |
tests/unit/tools/sandbox/test_config.py |
Unit tests for config defaults, validation, and immutability. |
tests/unit/tools/sandbox/test_errors.py |
Unit tests for sandbox error hierarchy and context immutability. |
tests/unit/tools/sandbox/test_protocol.py |
Unit tests verifying runtime-checkable protocol behavior. |
tests/unit/tools/sandbox/test_result.py |
Unit tests for SandboxResult semantics and frozen behavior. |
tests/unit/tools/sandbox/__init__.py |
Marks sandbox unit test package. |
tests/unit/tools/git/test_git_sandbox_integration.py |
Unit-level integration tests ensuring git tools work with/without sandbox and surface sandbox failures. |
tests/unit/observability/test_events.py |
Includes sandbox in the discovered observability domains list. |
tests/integration/tools/conftest.py |
Adds integration fixture for creating a real git repo for sandbox e2e tests. |
tests/integration/tools/test_sandbox_integration.py |
E2E tests running real git commands via sandbox and validating workspace/timeout behavior. |
tests/integration/tools/__init__.py |
Marks integration tools test package. |
DESIGN_SPEC.md |
Updates sandbox backend status/details and directory/event listings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| safe_prefixes = self._get_safe_path_prefixes() | ||
| entries = path_value.split(_PATH_SEP) | ||
| filtered = [ | ||
| e | ||
| for e in entries | ||
| if any(e.lower().startswith(prefix.lower()) for prefix in safe_prefixes) | ||
| ] |
There was a problem hiding this comment.
_filter_path() currently treats a PATH entry as safe if it merely starts with a safe prefix. This can be bypassed with traversal components (e.g. /usr/bin/../../home/user/bin) that still start with /usr/bin but resolve outside the intended safe directories. Consider normalizing each entry (e.g., Path(entry).resolve() when possible) and then checking it is equal to or contained within a safe directory, or at least rejecting entries containing .. segments.
| """Check if an env var name matches any denylist pattern.""" | ||
| upper = name.upper() | ||
| return any( | ||
| fnmatch.fnmatch(upper, pat) for pat in self._config.env_denylist_patterns | ||
| ) | ||
|
|
There was a problem hiding this comment.
_matches_denylist() uppercases the env var name but compares it against patterns without normalizing the pattern case. This makes denylist behavior depend on whether callers provide uppercase patterns (works with defaults) and can fail for lowercase custom patterns. Consider uppercasing (or casefold()-ing) the patterns once (e.g., in config validation or at comparison time) for consistent case-insensitive matching.
| """Check if an env var name matches any denylist pattern.""" | |
| upper = name.upper() | |
| return any( | |
| fnmatch.fnmatch(upper, pat) for pat in self._config.env_denylist_patterns | |
| ) | |
| """Check if an env var name matches any denylist pattern. | |
| Uses case-insensitive matching on Windows where env var names | |
| are case-insensitive, mirroring allowlist behavior. | |
| """ | |
| check_name = name.upper() if os.name == "nt" else name | |
| for pattern in self._config.env_denylist_patterns: | |
| check_pattern = pattern.upper() if os.name == "nt" else pattern | |
| if fnmatch.fnmatch(check_name, check_pattern): | |
| return True | |
| return False |
src/ai_company/tools/_git_base.py
Outdated
| logger.warning( | ||
| GIT_COMMAND_TIMEOUT, | ||
| command=_sanitize_command(["git", *args]), | ||
| deadline="sandbox", |
There was a problem hiding this comment.
In the sandboxed timeout path, GIT_COMMAND_TIMEOUT is logged with deadline="sandbox", whereas the direct path logs a numeric deadline (seconds). Mixing types for the same field can break downstream parsing/metrics. Consider logging the actual numeric timeout used (available from the sandbox call) and/or using a separate field (e.g. backend="sandbox") to indicate the source.
| deadline="sandbox", | |
| backend="sandbox", |
…, Greptile, and Copilot - Harden PATH filtering with normpath + os.sep boundary check (prefix spoofing fix) - Fix case-insensitive denylist matching (uppercase both name and pattern) - Add credential redaction in logged args via _redact_args helper - Split execute() into _spawn_process, _communicate_with_timeout, _drain_after_kill - Fix _kill_process double-kill + add ProcessLookupError handling - Add SANDBOX_KILL_FAILED event constant for unkillable processes - Remove broad except Exception from _run_git_sandboxed - Add _sandbox_result_to_execution_result deadline parameter - Make _MAX_COUNT_LIMIT Final, update GitCloneTool docstring - Update DESIGN_SPEC.md §11.1.1, §11.2, §15.5 for sandbox status - Update CLAUDE.md package structure with sandbox - Make git test fixtures hermetic (GIT_CONFIG_GLOBAL=os.devnull) - Add tests: env_overrides bypass, prefix spoofing, path fallback, zero timeout - Add parametrized SandboxResult success matrix test - Add sandbox events existence test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if os.name == "nt": | ||
| system_root = os.environ.get("SYSTEMROOT", r"C:\WINDOWS") | ||
| return ( | ||
| system_root, | ||
| str(Path(system_root) / "system32"), | ||
| r"C:\Program Files\Git", | ||
| r"C:\Program Files (x86)\Git", | ||
| ) | ||
| return ("/usr/bin", "/usr/local/bin", "/bin", "/usr/sbin", "/sbin") |
There was a problem hiding this comment.
Hardcoded Windows Git paths bypass caller-controlled PATH restriction
_get_safe_path_prefixes unconditionally includes r"C:\Program Files\Git" and r"C:\Program Files (x86)\Git" as safe path prefixes regardless of env_allowlist or env_denylist_patterns. A caller who instantiates the sandbox with an explicit SubprocessSandboxConfig cannot opt out of these entries because the prefix list is hard-wired in a @staticmethod. If the intent is to restrict git to a particular installation (or to deny git entirely), those entries would need to be removed manually.
This is unlikely to cause issues for the current use case (git tools always need git on PATH), but the tight coupling between "safe PATH prefixes" and the sandbox config is worth documenting — or the prefix list could be made a config field so callers can override it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/sandbox/subprocess_sandbox.py
Line: 194-202
Comment:
**Hardcoded Windows Git paths bypass caller-controlled PATH restriction**
`_get_safe_path_prefixes` unconditionally includes `r"C:\Program Files\Git"` and `r"C:\Program Files (x86)\Git"` as safe path prefixes regardless of `env_allowlist` or `env_denylist_patterns`. A caller who instantiates the sandbox with an explicit `SubprocessSandboxConfig` cannot opt out of these entries because the prefix list is hard-wired in a `@staticmethod`. If the intent is to restrict git to a particular installation (or to deny git entirely), those entries would need to be removed manually.
This is unlikely to cause issues for the current use case (git tools always need git on `PATH`), but the tight coupling between "safe PATH prefixes" and the sandbox config is worth documenting — or the prefix list could be made a config field so callers can override it.
How can I resolve this? If you propose a fix, please make it concise.| f"Invalid clone URL. Only {schemes}" | ||
| "and SCP-like (user@host:path) URLs are " | ||
| "allowed" |
There was a problem hiding this comment.
Missing space in error message string concatenation
The two adjacent string literals are concatenated without a space. The resulting message reads "Invalid clone URL. Only https://, ssh://, git://and SCP-like…" — the last scheme runs directly into the word "and" with no separator.
| f"Invalid clone URL. Only {schemes}" | |
| "and SCP-like (user@host:path) URLs are " | |
| "allowed" | |
| content=( | |
| f"Invalid clone URL. Only {schemes} " | |
| "and SCP-like (user at host:path) URLs are " | |
| "allowed" | |
| ), |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/git_tools.py
Line: 690-692
Comment:
**Missing space in error message string concatenation**
The two adjacent string literals are concatenated without a space. The resulting message reads `"Invalid clone URL. Only https://, ssh://, git://and SCP-like…"` — the last scheme runs directly into the word `"and"` with no separator.
```suggestion
content=(
f"Invalid clone URL. Only {schemes} "
"and SCP-like (user at host:path) URLs are "
"allowed"
),
```
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
SubprocessSandbox) implementing theSandboxBackendprotocol for tool execution isolationLD_PRELOAD,PYTHONPATH, etc.), case-insensitive matching on Windowscwdoutside the workspaceGitStatusTool,GitLogTool,GitDiffTool,GitBranchTool,GitCommitTool,GitCloneTool) accept optionalSandboxBackendvia_BaseGitTool, with git hardening env vars passed asenv_overridesMappingProxyTypefor hardening overrides,http://removed from allowed clone schemestimeout=0.0no longer silently uses default,returncode=0no longer mapped to-1, properraise from excchainingsandbox.pyto events listingFiles
New
src/ai_company/tools/sandbox/—__init__.py,config.py,errors.py,protocol.py,result.py,subprocess_sandbox.pysrc/ai_company/observability/events/sandbox.py— 11 SANDBOX_* event constantstests/unit/tools/sandbox/— config, errors, protocol, result, subprocess_sandbox teststests/unit/tools/git/test_git_sandbox_integration.py— sandbox integration for 5 git toolstests/integration/tools/test_sandbox_integration.py— real git + sandbox e2etests/integration/tools/conftest.py— sharedgit_repofixtureModified
src/ai_company/tools/_git_base.py— sandbox integration, secret filtering,MappingProxyTypesrc/ai_company/tools/git_tools.py— removehttp://from clone schemesDESIGN_SPEC.md— §11.1.2 and §15.3 updatesTest plan
uv run pytest tests/ -n auto)Review coverage
Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 21 findings triaged and addressed.
Closes #131