fix: harden subprocess cleanup, env filtering, and shutdown resilience#155
fix: harden subprocess cleanup, env filtering, and shutdown resilience#155
Conversation
CI fixes: - Remove unused type: ignore on Linux (use dual suppression for cross-platform compat) in subprocess_sandbox.py - Fix test assertion for raise-from-build_exc chaining in test_agent_engine_errors.py Review feedback (Greptile): - Rename _check_ref → _check_git_arg with updated docstring (was misleadingly named when used for author/date validation) - Fix missing space in clone URL error message - Add EXECUTION_SHUTDOWN_CLEANUP_FAILED event constant (was reusing EXECUTION_SHUTDOWN_CLEANUP for both start and failure) - Capture partial stderr on timeout in direct subprocess path (mirrors sandbox behavior for better debuggability) - Make Windows Git PATH prefixes configurable via SubprocessSandboxConfig.extra_safe_path_prefixes - Add clarifying comment for total_entries semantics when truncated - Fix ResourceWarning on Windows by explicitly closing subprocess transports after communicate() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document Python 3.14+, uv, and Git 2.x+ as system-level prerequisites. Git is a runtime dependency for the built-in git tools (subprocess-based, not managed by pip/uv). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 10 agents, 15 findings addressed: - Extract shared subprocess transport cleanup utility (_process_cleanup.py) - Add stderr sanitization (control char stripping + truncation) to git timeout path - Add field_validator for extra_safe_path_prefixes (reject empty/relative paths) - Log killpg OSError in sandbox instead of silently suppressing - Extract _log_post_cancel_exceptions in shutdown to reduce complexity - Add stderr.write fallback in Windows signal handler (no event loop case) - Update DESIGN_SPEC.md §11.1.1 and §11.1.2 for accuracy - Fix inaccurate comment in list_directory.py - Replace inline logging string with event constant - Add tests for timeout stderr, extra_safe_path_prefixes validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughPolicy docs now mandate fixing all valid findings; adds subprocess transport cleanup utility and observability keys; strengthens git arg validation and stderr sanitization; extends sandbox config with extra safe PATH prefixes and ensures subprocess transport cleanup; updates shutdown logging/handling and associated tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant Sandbox as SubprocessSandbox
participant Subproc as asyncio.subprocess.Process
participant Cleanup as close_subprocess_transport
participant Obs as Observability
rect rgba(200,200,255,0.5)
Caller->>Sandbox: execute(command, ...)
end
Sandbox->>Subproc: spawn subprocess
Note right of Subproc: run / communicate / timeout possible
alt timeout or normal completion
Sandbox->>Subproc: communicate_with_timeout()
end
rect rgba(200,255,200,0.5)
Sandbox->>Cleanup: _close_process(proc)
Cleanup->>Subproc: access _transport and close()
Cleanup->>Obs: on close failure -> emit TOOL_SUBPROCESS_TRANSPORT_CLOSE_FAILED
end
alt kill failed (OSError)
Sandbox->>Obs: emit SANDBOX_KILL_FALLBACK with pid/error
Sandbox->>Subproc: attempt proc.kill() (suppressed)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness, security, and maintainability of the system by focusing on subprocess management, error handling, and shutdown procedures. It centralizes subprocess cleanup, improves the sanitization and reporting of errors, and strengthens the sandbox environment's path filtering capabilities. Additionally, it refines the internal review process guidelines to ensure comprehensive issue resolution, contributing to a more stable and secure codebase. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant hardening and resilience improvements across subprocess handling, shutdown logic, and environment filtering. Key changes include extracting shared subprocess cleanup logic into _process_cleanup.py, hardening stderr handling in the git timeout path, and enhancing shutdown resilience. However, a potential security weakness was identified in the stderr sanitization logic for git timeout messages, which could be exploited for indirect prompt injection due to incomplete stripping of control characters (specifically newlines). Additionally, consider making the new process cleanup utility more explicit about its platform-specific nature.
src/ai_company/tools/_git_base.py
Outdated
| ) | ||
|
|
||
|
|
||
| _CONTROL_CHAR_RE = re.compile(r"[\x00-\x08\x0b-\x1f\x7f]") |
There was a problem hiding this comment.
The _sanitize_stderr function uses an incomplete regex for stripping control characters. Specifically, the _CONTROL_CHAR_RE regex [\x00-\x08\x0b-\x1f\x7f] does not include the newline character (\x0a or \n). Since the sanitized stderr_fragment is included in the ToolExecutionResult content (which is fed back to the LLM), an attacker who can control the stderr of a git command (e.g., via a malicious repository or a slow server causing a timeout) could inject newlines followed by a prompt injection payload. This could manipulate the agent's behavior.
Consider updating the regex to include all control characters, including newlines and tabs.
| def close_subprocess_transport(proc: asyncio.subprocess.Process) -> None: | ||
| """Close subprocess transport to prevent ResourceWarning on Windows. | ||
|
|
||
| On Windows with ProactorEventLoop, pipe transports may not be | ||
| closed promptly after kill+communicate, causing ResourceWarning | ||
| at GC time. Explicitly closing the transport avoids this. | ||
|
|
||
| Uses ``getattr`` to access the CPython-internal ``_transport`` | ||
| attribute — if the attribute is absent (different runtime or | ||
| future CPython version), this is a no-op. Exceptions from | ||
| ``close()`` are logged and suppressed so cleanup never masks | ||
| the primary result. | ||
|
|
||
| Args: | ||
| proc: The subprocess whose transport should be closed. | ||
| """ | ||
| transport = getattr(proc, "_transport", None) | ||
| if transport is None or transport.is_closing(): | ||
| return | ||
| try: | ||
| transport.close() | ||
| except Exception: | ||
| logger.debug( | ||
| TOOL_SUBPROCESS_TRANSPORT_CLOSE_FAILED, | ||
| exc_info=True, | ||
| ) |
There was a problem hiding this comment.
While the use of getattr makes this function safe on platforms where _transport doesn't exist, the docstring explicitly mentions this is a workaround for ProactorEventLoop on Windows. To make the intent clearer and avoid any potential side effects on other platforms, consider adding a platform check at the beginning of the function to make it a no-op on non-Windows systems.
def close_subprocess_transport(proc: asyncio.subprocess.Process) -> None:
"""Close subprocess transport to prevent ResourceWarning on Windows.
On Windows with ProactorEventLoop, pipe transports may not be
closed promptly after kill+communicate, causing ResourceWarning
at GC time. Explicitly closing the transport avoids this.
Uses ``getattr`` to access the CPython-internal ``_transport``
attribute — if the attribute is absent (different runtime or
future CPython version), this is a no-op. Exceptions from
``close()`` are logged and suppressed so cleanup never masks
the primary result.
Args:
proc: The subprocess whose transport should be closed.
"""
import os
if os.name != "nt":
return
transport = getattr(proc, "_transport", None)
if transport is None or transport.is_closing():
return
try:
transport.close()
except Exception:
logger.debug(
TOOL_SUBPROCESS_TRANSPORT_CLOSE_FAILED,
exc_info=True,
)There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/tools/_git_base.py (1)
321-346:⚠️ Potential issue | 🟠 MajorApply the timeout stderr sanitization to sandboxed git runs too.
Lines 321-346 harden only the direct-subprocess timeout path. When
self._sandboxis set,_run_git()returns through_sandbox_result_to_execution_result(), which still surfaces rawresult.stderron timeout, so control characters and oversized stderr bypass this new hardening entirely.🔧 Suggested change
def _sandbox_result_to_execution_result( args: list[str], result: SandboxResult, *, deadline: float, @@ if result.timed_out: + stderr_fragment = _sanitize_stderr(result.stderr.strip()) logger.warning( GIT_COMMAND_TIMEOUT, command=_sanitize_command(["git", *args]), deadline=deadline, + stderr_fragment=stderr_fragment, ) + msg = f"Git command timed out after {deadline}s" + if stderr_fragment: + msg += f": {stderr_fragment}" return ToolExecutionResult( - content=result.stderr or "Git command timed out", + content=msg, is_error=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/tools/_git_base.py` around lines 321 - 346, The sandbox path still returns raw stderr on timeout; update _sandbox_result_to_execution_result to apply the same stderr sanitization and truncation used in _run_git (use _sanitize_stderr) before building the ToolExecutionResult so control characters and oversized stderr are never exposed; specifically, detect timeout cases in _sandbox_result_to_execution_result (where self._sandbox results are converted), run _sanitize_stderr(result.stderr.decode(...)) and pass the sanitized fragment into the returned ToolExecutionResult content/fields (and any logged messages) instead of the raw result.stderr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/engine/shutdown.py`:
- Around line 263-266: The try/except around task.exception() currently swallows
errors; replace the silent pass with a debug-level log that records the failure
and the task identity so shutdown diagnostics aren’t lost. Specifically, in the
shutdown helper where you call task.exception() and assign exc, catch Exception
and call the module logger (e.g., logger.debug(...) or
process_logger.debug(...)) including the task variable and the exception info
(exc_info=True or use logger.exception at debug level) so unexpected failures
retrieving task.exception() are recorded.
In `@src/ai_company/tools/_process_cleanup.py`:
- Around line 36-40: The is_closing() check on the transport is unguarded and
can raise if the method is missing; wrap the probe in the same safety as close()
by moving the is_closing() call into the try/except block (or defensively check
callable(getattr(transport, "is_closing", None)) before invoking) so that both
transport.is_closing() and transport.close() are protected from unexpected
exceptions; update references in this block that use proc._transport,
transport.is_closing, and transport.close to ensure any error from either method
is caught and handled consistently.
In `@src/ai_company/tools/file_system/list_directory.py`:
- Around line 234-238: The truncation flag is computed from len(lines) which is
wrong because _list_sync stops after MAX_ENTRIES + 1 raw entries and
_classify_entry may drop some, so len(lines) <= MAX_ENTRIES does not guarantee
completeness; modify the listing logic to use the explicit raw-truncation
indicator from _list_sync (e.g., a returned boolean like raw_truncated or an
extra sentinel entry) to set metadata["truncated"] and compute total_entries
from the true count of visible entries (post-classification) while preserving
MAX_ENTRIES semantics; update the code paths around _list_sync, _classify_entry,
lines, MAX_ENTRIES, metadata["truncated"], and total_entries so truncated is
derived from the raw-truncated flag rather than len(lines) > MAX_ENTRIES.
---
Outside diff comments:
In `@src/ai_company/tools/_git_base.py`:
- Around line 321-346: The sandbox path still returns raw stderr on timeout;
update _sandbox_result_to_execution_result to apply the same stderr sanitization
and truncation used in _run_git (use _sanitize_stderr) before building the
ToolExecutionResult so control characters and oversized stderr are never
exposed; specifically, detect timeout cases in
_sandbox_result_to_execution_result (where self._sandbox results are converted),
run _sanitize_stderr(result.stderr.decode(...)) and pass the sanitized fragment
into the returned ToolExecutionResult content/fields (and any logged messages)
instead of the raw result.stderr.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3cf70ca9-ab84-4204-9bf4-bf478bb55b10
📒 Files selected for processing (18)
.claude/skills/aurelio-review-pr/SKILL.md.claude/skills/pre-pr-review/SKILL.mdCLAUDE.mdDESIGN_SPEC.mdREADME.mdsrc/ai_company/engine/shutdown.pysrc/ai_company/observability/events/execution.pysrc/ai_company/observability/events/sandbox.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/_git_base.pysrc/ai_company/tools/_process_cleanup.pysrc/ai_company/tools/file_system/list_directory.pysrc/ai_company/tools/git_tools.pysrc/ai_company/tools/sandbox/config.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pytests/unit/engine/test_agent_engine_errors.pytests/unit/tools/git/test_git_tools.pytests/unit/tools/sandbox/test_subprocess_sandbox.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Never usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (without parentheses) as enforced by ruff for PEP 758 on Python 3.14
All public functions must have type hints and be compatible with mypy strict mode
All public classes and functions must have Google-style docstrings (enforced by ruff D rules)
Create new objects instead of mutating existing ones — implement immutability
For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention and usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one Pydantic model
Use Pydantic v2 withBaseModel,model_validator,computed_field, andConfigDict
Use@computed_fieldfor derived values instead of storing and validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — use structured concurrency over barecreate_task
Maintain line length of 88 characters (enforced by ruff)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/observability/events/sandbox.pysrc/ai_company/observability/events/tool.pytests/unit/engine/test_agent_engine_errors.pysrc/ai_company/tools/_git_base.pysrc/ai_company/observability/events/execution.pysrc/ai_company/tools/git_tools.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pysrc/ai_company/tools/sandbox/config.pytests/unit/tools/sandbox/test_subprocess_sandbox.pysrc/ai_company/tools/_process_cleanup.pytests/unit/tools/git/test_git_tools.pysrc/ai_company/engine/shutdown.pysrc/ai_company/tools/file_system/list_directory.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logging viafrom ai_company.observability import get_loggerand createlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code
Always name the logger variablelogger(not_logger, notlog)
Always use event name constants fromai_company.observability.eventsdomain-specific modules (e.g.,PROVIDER_CALL_STARTfromevents.provider) — import directly from the specific events module
Use structured logging withlogger.info(EVENT, key=value)syntax — never use format strings likelogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG log level for object creation, internal flow, and entry/exit of key functions
All provider calls must go throughBaseCompletionProviderwhich applies retry and rate limiting automatically
Never implement retry logic in driver subclasses or calling code — retry is handled byBaseCompletionProviderbase class
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig
Retryable errors includeRateLimitError,ProviderTimeoutError,ProviderConnectionError, andProviderInternalError— all withis_retryable=True
Non-retryable errors must raise immediately without retry
RetryExhaustedErrorsignals all retries have failed — engine layer catches this to trigger fallback chains
Rate limiter must respectRateLimitError.retry_afterfrom providers and automatically pause future requests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names:example-provider,example-large-001,example-medium-001,example-small-001, orlarge/medium/smallas aliases
Files:
src/ai_company/observability/events/sandbox.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/_git_base.pysrc/ai_company/observability/events/execution.pysrc/ai_company/tools/git_tools.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pysrc/ai_company/tools/sandbox/config.pysrc/ai_company/tools/_process_cleanup.pysrc/ai_company/engine/shutdown.pysrc/ai_company/tools/file_system/list_directory.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark unit tests with@pytest.mark.unit
Mark integration tests with@pytest.mark.integration
Mark end-to-end tests with@pytest.mark.e2e
Mark slow tests with@pytest.mark.slow
Maintain 80% minimum code coverage (enforced in CI)
All tests must complete within 30 seconds timeout
Usepytest-xdistvia-n autofor parallel test execution
Prefer@pytest.mark.parametrizefor testing similar cases
Usetest-provider,test-small-001(not vendor-specific names) in tests
Files:
tests/unit/engine/test_agent_engine_errors.pytests/unit/tools/sandbox/test_subprocess_sandbox.pytests/unit/tools/git/test_git_tools.py
🧠 Learnings (9)
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: For trivial/docs-only changes use `/pre-pr-review quick` to skip agents but still run automated checks
Applied to files:
.claude/skills/aurelio-review-pr/SKILL.md.claude/skills/pre-pr-review/SKILL.mdCLAUDE.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Surface improvements as suggestions, not silent changes — user decides whether to proceed
Applied to files:
.claude/skills/pre-pr-review/SKILL.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Present every implementation plan to the user for accept/deny before coding starts
Applied to files:
.claude/skills/pre-pr-review/SKILL.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: NEVER create a PR directly — use `/pre-pr-review` to create PRs with automated checks and review agents
Applied to files:
.claude/skills/pre-pr-review/SKILL.mdCLAUDE.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Use `/aurelio-review-pr` to handle external reviewer feedback after PR is created
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Run pre-commit hooks (trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch, ruff check+format, gitleaks) before committing
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: After finishing an issue implementation, create a feature branch (`<type>/<slug>`), commit, and push — do NOT create a PR automatically
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Update `DESIGN_SPEC.md` to reflect approved deviations when they occur
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue
Applied to files:
DESIGN_SPEC.md
🧬 Code graph analysis (6)
src/ai_company/tools/_git_base.py (2)
src/ai_company/tools/_process_cleanup.py (1)
close_subprocess_transport(20-45)src/ai_company/tools/base.py (1)
ToolExecutionResult(24-53)
src/ai_company/tools/git_tools.py (1)
src/ai_company/tools/_git_base.py (1)
_check_git_arg(198-227)
src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
src/ai_company/tools/_process_cleanup.py (1)
close_subprocess_transport(20-45)
tests/unit/tools/sandbox/test_subprocess_sandbox.py (2)
src/ai_company/tools/sandbox/subprocess_sandbox.py (5)
config(108-110)SubprocessSandbox(61-533)workspace(113-115)_get_safe_path_prefixes(193-210)_build_filtered_env(212-255)src/ai_company/tools/sandbox/config.py (1)
SubprocessSandboxConfig(8-83)
src/ai_company/tools/_process_cleanup.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
tests/unit/tools/git/test_git_tools.py (3)
tests/unit/tools/git/conftest.py (2)
status_tool(123-125)_run_git(31-39)src/ai_company/tools/git_tools.py (1)
GitStatusTool(60-123)src/ai_company/tools/_git_base.py (1)
_run_git(439-471)
🪛 LanguageTool
.claude/skills/aurelio-review-pr/SKILL.md
[style] ~320-~320: To elevate your writing, try using a synonym here.
Context: ...ting" or "out of scope" — if a reviewer found a valid issue in code touched by (or ad...
(FIND_LOCATE)
.claude/skills/pre-pr-review/SKILL.md
[style] ~294-~294: To elevate your writing, try using a synonym here.
Context: ...isting" or "out of scope" — if an agent found a valid issue in code touched by (or ad...
(FIND_LOCATE)
CLAUDE.md
[style] ~132-~132: To elevate your writing, try using a synonym here.
Context: ...alid — never skip**: When review agents find valid issues (including pre-existing is...
(FIND_LOCATE)
🔇 Additional comments (15)
README.md (1)
39-44: LGTM!The new System Requirements section is well-placed before "Getting Started" and accurately documents the runtime dependencies (Python 3.14+, uv, Git 2.x+). This aligns with the Tech Stack section and provides clear prerequisites for users.
tests/unit/engine/test_agent_engine_errors.py (1)
264-265: LGTM!The updated assertions properly verify exception chaining behavior — confirming that when
_handle_fatal_errorencounters a secondary failure, the originalRuntimeErroris raised with the secondaryValueErrorcorrectly set as its__cause__. This provides stronger validation of the error propagation contract.src/ai_company/observability/events/tool.py (1)
42-46: LGTM!The new
TOOL_SUBPROCESS_TRANSPORT_CLOSE_FAILEDconstant follows established patterns: properFinal[str]typing, consistentTOOL_*prefix, hierarchical dot-notation value, and organized under a clear section comment.src/ai_company/observability/events/execution.py (1)
45-45: LGTM!The new
EXECUTION_SHUTDOWN_CLEANUP_FAILEDconstant is logically placed between the cleanup start and timeout events, completing the shutdown cleanup event hierarchy.src/ai_company/observability/events/sandbox.py (1)
16-16: LGTM!The new
SANDBOX_KILL_FALLBACKconstant is appropriately placed afterSANDBOX_KILL_FAILED, logically grouping kill-related events together..claude/skills/aurelio-review-pr/SKILL.md (2)
320-327: LGTM!The updated policy clearly communicates the new default behavior: implement all valid findings including pre-existing issues. The options list properly reflects "Implement all (Recommended)" as the preferred choice.
414-414: LGTM!The new rule "Fix everything valid — never defer, never skip" is consistent with the updated Phase 6 guidance and reinforces the policy throughout the document.
.claude/skills/pre-pr-review/SKILL.md (2)
293-301: LGTM!The policy update mirrors the changes in
aurelio-review-pr/SKILL.md, maintaining consistency across review pipelines. Users retain the ability to skip items viaAskUserQuestion, but the default now emphasizes comprehensive fixes.
399-399: LGTM!The rule "Fix everything valid — never defer, never skip" completes the policy alignment with the Aurelio review pipeline.
CLAUDE.md (1)
132-132: LGTM!The new "Fix everything valid — never skip" policy is appropriately placed in the Pre-PR Review section and provides clear, actionable guidance consistent with the SKILL.md updates.
src/ai_company/tools/sandbox/subprocess_sandbox.py (5)
17-32: LGTM!The new imports are correctly placed and follow existing conventions.
SANDBOX_KILL_FALLBACKis properly imported from the domain-specific events module, andclose_subprocess_transportis imported from the shared process cleanup utility.
193-210: LGTM!The refactoring from static method to instance method is correctly implemented. The method needs
selfto accessself._config.extra_safe_path_prefixes, and the tuple concatenation at line 210 properly creates a new tuple without mutating the defaults.
289-306: LGTM!The error handling improvement properly addresses the PR objective of logging
OSErrorfromkillpginstead of suppressing it. The structured logging at WARNING level withpidanderrorcontext follows the coding guidelines. The fallback toproc.kill()withProcessLookupErrorsuppression is appropriate—otherOSErrortypes would propagate, which is explicit error handling rather than silent swallowing.
308-316: LGTM!Good abstraction that delegates to the shared
close_subprocess_transportutility. This eliminates duplication and centralizes the CPython-internal_transportaccess pattern. The docstring properly references the target function for implementation details, and the cleanup semantics (log and suppress exceptions) are appropriate for operations that shouldn't mask primary results.
469-481: LGTM!The
try/finallypattern properly ensures subprocess transport cleanup regardless of success, timeout, or exceptions. This addresses the WindowsResourceWarningissue mentioned in the PR objectives. The_close_processcall is idempotent sinceclose_subprocess_transportchecksis_closing()before attempting to close.
There was a problem hiding this comment.
Pull request overview
This PR hardens subprocess-based tooling and shutdown behavior by extracting shared cleanup utilities, tightening git timeout error reporting, and adding validation/config knobs for sandbox PATH filtering, alongside related tests and documentation updates.
Changes:
- Add shared asyncio subprocess transport cleanup helper and apply it in git + sandbox subprocess paths.
- Improve git timeout error messages by including a sanitized stderr fragment; add config validation/tests for sandbox
extra_safe_path_prefixes. - Improve shutdown cleanup robustness and observability (post-cancel exception logging, new cleanup-failure event, Windows stderr fallback).
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/tools/sandbox/test_subprocess_sandbox.py | Adds unit tests for extra_safe_path_prefixes behavior and validation. |
| tests/unit/tools/git/test_git_tools.py | Adds tests ensuring git timeout errors include/sanitize stderr fragments. |
| tests/unit/engine/test_agent_engine_errors.py | Adjusts fatal-error chaining assertion to validate the secondary failure cause. |
| src/ai_company/tools/sandbox/subprocess_sandbox.py | Uses shared transport cleanup and logs killpg fallback failures via a new sandbox event. |
| src/ai_company/tools/sandbox/config.py | Introduces extra_safe_path_prefixes with Pydantic validation for non-empty absolute paths. |
| src/ai_company/tools/git_tools.py | Renames/refactors ref/arg flag-injection validation calls; fixes clone URL error spacing. |
| src/ai_company/tools/file_system/list_directory.py | Updates comments clarifying total semantics and truncation behavior. |
| src/ai_company/tools/_process_cleanup.py | New shared helper to close subprocess transports safely without masking primary errors. |
| src/ai_company/tools/_git_base.py | Adds stderr sanitization + inclusion on timeout; ensures transport cleanup in direct subprocess path. |
| src/ai_company/observability/events/tool.py | Adds tool event constant for subprocess transport close failures. |
| src/ai_company/observability/events/sandbox.py | Adds sandbox event constant for kill fallback logging. |
| src/ai_company/observability/events/execution.py | Adds execution shutdown cleanup-failure event constant. |
| src/ai_company/engine/shutdown.py | Refactors post-cancel exception retrieval/logging; improves cleanup failure logging and Windows signal fallback visibility. |
| README.md | Documents runtime/system requirements (Python 3.14+, uv, Git). |
| DESIGN_SPEC.md | Updates git hardening and sandbox feature descriptions for current behavior. |
| CLAUDE.md | Updates contributor workflow guidance (“fix everything valid — never skip”). |
| .claude/skills/pre-pr-review/SKILL.md | Aligns skill instructions with “implement all valid findings” default behavior. |
| .claude/skills/aurelio-review-pr/SKILL.md | Aligns skill instructions with “implement all valid findings” default behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/tools/_git_base.py
Outdated
| def _sanitize_stderr(raw: str) -> str: | ||
| """Strip control characters and truncate stderr for safe inclusion.""" | ||
| return _CONTROL_CHAR_RE.sub("", raw)[:_MAX_STDERR_FRAGMENT] |
There was a problem hiding this comment.
_sanitize_stderr() claims to "strip control characters", but the regex leaves common control chars like \n, \r, and \t. Because the sanitized fragment is included both in the tool-facing timeout message and in the GIT_COMMAND_TIMEOUT log event, this can still allow multi-line/log-injection style output. Consider either expanding the regex (or normalizing whitespace) to remove/replace \r/\n/\t, or tighten the docstring to match the actual sanitization guarantees.
| # total reflects classified entries (not raw filesystem entries). | ||
| # _list_sync's islice caps scanning at MAX_ENTRIES + 1, but | ||
| # _classify_entry may drop some entries (e.g. non-symlinks outside | ||
| # workspace), so len(lines) can be less than the islice cap. | ||
| # When truncated, the true directory size is unknown. | ||
| total = len(lines) | ||
| truncated = total > MAX_ENTRIES | ||
| if truncated: |
There was a problem hiding this comment.
The new comment notes _classify_entry may drop entries after _list_sync caps scanning at MAX_ENTRIES + 1, but truncation is currently computed from len(lines) > MAX_ENTRIES. If many entries are dropped, the scan can still hit the cap while len(lines) stays <= MAX_ENTRIES, causing truncated=False and total_entries to be misleading even though the listing may be incomplete. Consider tracking truncation based on the raw entries length (or returning a separate truncated_raw flag / sentinel) rather than the post-classification lines length.
| def test_rejects_empty_string(self) -> None: | ||
| with pytest.raises(ValueError, match="non-empty absolute"): | ||
| SubprocessSandboxConfig(extra_safe_path_prefixes=("",)) | ||
|
|
||
| def test_rejects_relative_path(self) -> None: | ||
| with pytest.raises(ValueError, match="non-empty absolute"): | ||
| SubprocessSandboxConfig( | ||
| extra_safe_path_prefixes=("relative/path",), | ||
| ) |
There was a problem hiding this comment.
These config-validation tests use pytest.raises(ValueError), which will also match pydantic.ValidationError and makes it less clear that this is Pydantic boundary validation (and consistent with tests/unit/tools/sandbox/test_config.py). Consider asserting ValidationError explicitly (and importing it) so failures don’t accidentally pass due to unrelated ValueErrors.
src/ai_company/engine/shutdown.py
Outdated
| logger.debug( | ||
| EXECUTION_SHUTDOWN_TASK_ERROR, | ||
| error=(f"Post-cancel task exception: {type(exc).__name__}"), |
There was a problem hiding this comment.
_log_post_cancel_exceptions() currently logs only the exception type name. For post-cancel failures this is often not actionable without the exception message and some task identity (e.g., task.get_name() / id(task)). Consider logging str(exc) (or repr(exc)) and a task identifier to make these DEBUG logs useful for diagnosing shutdown issues.
| logger.debug( | |
| EXECUTION_SHUTDOWN_TASK_ERROR, | |
| error=(f"Post-cancel task exception: {type(exc).__name__}"), | |
| task_name = getattr(task, "get_name", None) | |
| task_name = task_name() if callable(task_name) else None | |
| logger.debug( | |
| EXECUTION_SHUTDOWN_TASK_ERROR, | |
| error=( | |
| f"Post-cancel task exception: " | |
| f"{type(exc).__name__}: {exc}" | |
| ), | |
| task_name=task_name, | |
| task_id=id(task), | |
| error_message=str(exc), |
Greptile SummaryThis PR is a well-scoped hardening pass across the subprocess, sandbox, and shutdown layers. It eliminates duplication via the new Key changes:
Confidence Score: 5/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph GitTool["_BaseGitTool._run_git_direct()"]
A[Start subprocess] --> B{OSError?}
B -- yes --> ERR1[ToolExecutionResult is_error=True]
B -- no --> C[wait_for communicate timeout=deadline]
C --> D{TimeoutError?}
D -- no --> G[_process_git_output]
D -- yes --> E[proc.kill]
E --> F["wait_for communicate timeout=5s\n+ _sanitize_stderr()"]
F --> ERR2["ToolExecutionResult is_error=True\n(with sanitized stderr fragment)"]
C --> FINALLY[finally: close_subprocess_transport]
F --> FINALLY
G --> FINALLY
end
subgraph Sandbox["SubprocessSandbox.execute()"]
S1[_validate_cwd] --> S2[_build_filtered_env\nallowlist + denylist + PATH filter]
S2 --> S3[_spawn_process\nstart_new_session=True on Unix]
S3 --> S4[_communicate_with_timeout]
S4 --> S5{TimeoutError?}
S5 -- yes --> S6["_kill_process\nkillpg on Unix / proc.kill fallback"]
S6 --> S7[_drain_after_kill timeout=5s]
S7 --> SFIN[finally: close_subprocess_transport]
S5 -- no --> SFIN
end
subgraph Shutdown["ShutdownManager signal handling"]
SH1{Platform?}
SH1 -- Unix --> SH2["loop.add_signal_handler\n→ _handle_signal"]
SH1 -- Windows --> SH3["signal.signal\n→ _handle_signal_threadsafe"]
SH3 --> SH4{Running loop?}
SH4 -- yes --> SH5["call_soon_threadsafe\n→ request_shutdown"]
SH4 -- no / RuntimeError --> SH6["request_shutdown() directly"]
SH6 --> SH7{Raised?}
SH7 -- yes --> SH8["sys.stderr.write fallback"]
SH8 --> SH9{stderr failed?}
SH9 -- yes --> SH10["silent pass (noqa S110)"]
end
Last reviewed commit: 940068d |
…, and Copilot Source fixes: - Move asyncio/types out of TYPE_CHECKING for PEP 649 compatibility (#1, #2) - Guard is_closing() inside try/except in _process_cleanup.py (#4) - Normalize all control chars (incl. newlines/tabs) in _sanitize_stderr (#6) - Apply stderr sanitization to sandbox git path too (#3) - Fix list_directory truncation to use raw scan cap, not post-classification count (#7) - Narrow except to InvalidStateError + log task identity in shutdown (#8) - Add loop.stop() fallback when request_shutdown() fails in signal handlers (#9) - Include zombie diagnostic in _drain_after_kill stderr output (#10) - Remove @staticmethod from _log_post_cancel_exceptions (#17) - Add _process_cleanup.py to DESIGN_SPEC §15.3 (#12) and §11.1.1 (#16) Test additions: - New test_process_cleanup.py: 7 tests covering all transport states (#5) - Add _sanitize_stderr truncation test (#14) - Add _log_post_cancel_exceptions tests (4 tests) (#11) - Add signal handler recovery tests (3 tests) (#15) - Use ValidationError instead of ValueError in config tests (#13) - Update existing tests for new sanitization behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/tools/_git_base.py`:
- Around line 84-91: The _sanitize_stderr function currently only strips control
characters which can still leave credentials like https://user:token@host/... in
logs; update _sanitize_stderr to additionally redact embedded credentials by
applying a regex to replace any user:password@ (or user@) credential portion in
URLs with a safe placeholder (e.g., [REDACTED] or remove credentials) before
collapsing control characters and truncating; reference the existing
_sanitize_stderr function, reuse/_extend the _CONTROL_CHAR_RE step and
_MAX_STDERR_FRAGMENT truncation, and add a URL credential regex pass to ensure
stderr and ToolExecutionResult.content cannot leak https://user:pass@ or similar
patterns.
In `@src/ai_company/tools/sandbox/subprocess_sandbox.py`:
- Around line 193-210: The current _get_safe_path_prefixes appends
SubprocessSandboxConfig.extra_safe_path_prefixes without validating them,
allowing repo-controlled directories to shadow system binaries; update
_get_safe_path_prefixes to only include extras that are absolute, exist, and are
vetted as trusted system locations (e.g., reside under known system roots or
match a maintained whitelist), and reject any paths that are writable by
non-root users or that are inside the workspace or other user-writable
directories; also ensure _spawn_process’s command resolution relies only on the
validated prefixes returned by _get_safe_path_prefixes so untrusted extras are
never used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7cbe6a4d-a113-4cd4-8617-4cc8a324543d
📒 Files selected for processing (11)
DESIGN_SPEC.mdsrc/ai_company/engine/shutdown.pysrc/ai_company/tools/_git_base.pysrc/ai_company/tools/_process_cleanup.pysrc/ai_company/tools/file_system/list_directory.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pytests/unit/engine/test_shutdown.pytests/unit/tools/file_system/test_list_directory.pytests/unit/tools/git/test_git_tools.pytests/unit/tools/sandbox/test_subprocess_sandbox.pytests/unit/tools/test_process_cleanup.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Never usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (without parentheses) as enforced by ruff for PEP 758 on Python 3.14
All public functions must have type hints and be compatible with mypy strict mode
All public classes and functions must have Google-style docstrings (enforced by ruff D rules)
Create new objects instead of mutating existing ones — implement immutability
For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention and usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one Pydantic model
Use Pydantic v2 withBaseModel,model_validator,computed_field, andConfigDict
Use@computed_fieldfor derived values instead of storing and validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — use structured concurrency over barecreate_task
Maintain line length of 88 characters (enforced by ruff)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/tools/sandbox/test_subprocess_sandbox.pytests/unit/tools/test_process_cleanup.pytests/unit/tools/git/test_git_tools.pysrc/ai_company/tools/_process_cleanup.pysrc/ai_company/tools/file_system/list_directory.pysrc/ai_company/engine/shutdown.pytests/unit/engine/test_shutdown.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pytests/unit/tools/file_system/test_list_directory.pysrc/ai_company/tools/_git_base.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark unit tests with@pytest.mark.unit
Mark integration tests with@pytest.mark.integration
Mark end-to-end tests with@pytest.mark.e2e
Mark slow tests with@pytest.mark.slow
Maintain 80% minimum code coverage (enforced in CI)
All tests must complete within 30 seconds timeout
Usepytest-xdistvia-n autofor parallel test execution
Prefer@pytest.mark.parametrizefor testing similar cases
Usetest-provider,test-small-001(not vendor-specific names) in tests
Files:
tests/unit/tools/sandbox/test_subprocess_sandbox.pytests/unit/tools/test_process_cleanup.pytests/unit/tools/git/test_git_tools.pytests/unit/engine/test_shutdown.pytests/unit/tools/file_system/test_list_directory.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logging viafrom ai_company.observability import get_loggerand createlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code
Always name the logger variablelogger(not_logger, notlog)
Always use event name constants fromai_company.observability.eventsdomain-specific modules (e.g.,PROVIDER_CALL_STARTfromevents.provider) — import directly from the specific events module
Use structured logging withlogger.info(EVENT, key=value)syntax — never use format strings likelogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG log level for object creation, internal flow, and entry/exit of key functions
All provider calls must go throughBaseCompletionProviderwhich applies retry and rate limiting automatically
Never implement retry logic in driver subclasses or calling code — retry is handled byBaseCompletionProviderbase class
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig
Retryable errors includeRateLimitError,ProviderTimeoutError,ProviderConnectionError, andProviderInternalError— all withis_retryable=True
Non-retryable errors must raise immediately without retry
RetryExhaustedErrorsignals all retries have failed — engine layer catches this to trigger fallback chains
Rate limiter must respectRateLimitError.retry_afterfrom providers and automatically pause future requests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names:example-provider,example-large-001,example-medium-001,example-small-001, orlarge/medium/smallas aliases
Files:
src/ai_company/tools/_process_cleanup.pysrc/ai_company/tools/file_system/list_directory.pysrc/ai_company/engine/shutdown.pysrc/ai_company/tools/sandbox/subprocess_sandbox.pysrc/ai_company/tools/_git_base.py
🧠 Learnings (4)
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
tests/unit/tools/sandbox/test_subprocess_sandbox.py
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Applies to tests/**/*.py : All tests must complete within 30 seconds timeout
Applied to files:
tests/unit/tools/git/test_git_tools.py
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Update `DESIGN_SPEC.md` to reflect approved deviations when they occur
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-07T12:24:27.098Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T12:24:27.098Z
Learning: Applies to **/*.py : Handle errors explicitly — never silently swallow exceptions
Applied to files:
src/ai_company/engine/shutdown.py
🧬 Code graph analysis (7)
tests/unit/tools/sandbox/test_subprocess_sandbox.py (2)
src/ai_company/tools/sandbox/subprocess_sandbox.py (5)
config(108-110)SubprocessSandbox(61-533)workspace(113-115)_get_safe_path_prefixes(193-210)_build_filtered_env(212-255)src/ai_company/tools/sandbox/config.py (1)
SubprocessSandboxConfig(8-83)
tests/unit/tools/test_process_cleanup.py (1)
src/ai_company/tools/_process_cleanup.py (1)
close_subprocess_transport(17-44)
src/ai_company/tools/_process_cleanup.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
src/ai_company/engine/shutdown.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
tests/unit/engine/test_shutdown.py (1)
src/ai_company/engine/shutdown.py (9)
strategy(352-354)CooperativeTimeoutStrategy(117-325)_log_post_cancel_exceptions(249-278)ShutdownStrategy(82-114)request_shutdown(85-87)request_shutdown(142-144)ShutdownManager(328-500)_handle_signal(375-392)_handle_signal_threadsafe(394-441)
src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
src/ai_company/tools/_process_cleanup.py (1)
close_subprocess_transport(17-44)
src/ai_company/tools/_git_base.py (2)
src/ai_company/tools/_process_cleanup.py (1)
close_subprocess_transport(17-44)src/ai_company/tools/base.py (1)
ToolExecutionResult(24-53)
🔇 Additional comments (15)
src/ai_company/tools/file_system/list_directory.py (3)
74-113: LGTM! Truncation semantics correctly fixed.The new
raw_cappedreturn value properly separates raw scan truncation detection from the filtered line count. This addresses the prior concern where truncation was incorrectly inferred fromlen(lines) > MAX_ENTRIES, which could underreport when_classify_entrydrops entries. Now:
raw_cappedreflects whether the raw filesystem scan hitMAX_ENTRIEStotal_entriescorrectly counts only visible/classified entries- The truncation message and metadata accurately reflect the raw scan limit
231-265: LGTM!The
_format_listing_resultmethod correctly usesraw_cappedfor truncation signaling. The docstring accurately describes the parameter semantics.
293-321: LGTM!The
executemethod correctly unpacks the new tuple return from_list_syncand threadsraw_cappedthrough to_format_listing_result.tests/unit/tools/file_system/test_list_directory.py (1)
109-120: LGTM!The assertion change from
> MAX_ENTRIESto== MAX_ENTRIESis correct. The test creates plain text files that all pass_classify_entry, so exactlyMAX_ENTRIESentries appear in the output when truncated.tests/unit/tools/sandbox/test_subprocess_sandbox.py (1)
448-522: LGTM! Comprehensive test coverage forextra_safe_path_prefixes.The tests thoroughly cover:
- Integration with
_get_safe_path_prefixes()and_build_filtered_env()- Platform-specific behavior (Windows vs Unix paths)
- Boundary validation (empty string and relative path rejection via
ValidationError)- Default behavior preservation
tests/unit/engine/test_shutdown.py (2)
420-459: LGTM! Good coverage for_log_post_cancel_exceptions.The tests verify the helper correctly:
- Skips cancelled tasks without calling
exception()- Retrieves and logs exceptions from non-cancelled tasks
- Handles tasks with no exception gracefully
- Catches
InvalidStateErrorwithout propagating
464-504: LGTM! Signal handler recovery tests are comprehensive.The tests verify resilience in edge cases:
- Unix path falls back to
loop.stop()whenrequest_shutdown()raises- Windows no-loop path writes to stderr as last resort
- Stderr write failures are suppressed (no exception raised)
tests/unit/tools/git/test_git_tools.py (1)
729-816: LGTM! Good test coverage for stderr sanitization on timeout.The tests thoroughly verify:
- Stderr fragment inclusion in timeout messages
- Control character sanitization (null, bell, unit separator removed)
- Truncation to 500 characters
The mocking pattern correctly simulates timeout scenarios by having the first
communicate()call block indefinitely while the second returns the stderr payload.tests/unit/tools/test_process_cleanup.py (1)
12-70: LGTM! Comprehensive test coverage forclose_subprocess_transport.The tests cover all edge cases including:
- No-op paths (None/missing
_transport, already closing)- Normal operation (transport closed when open)
- Exception suppression (from both
is_closing()andclose())- Resilience to non-transport objects in
_transportsrc/ai_company/tools/_process_cleanup.py (1)
17-44: LGTM! Past review concern addressed.The
is_closing()call is now correctly guarded inside thetry/exceptblock (line 37), addressing the prior concern about unguarded exceptions. The implementation:
- Safely accesses
_transportviagetattr- Returns early if transport is
None- Guards both
is_closing()andclose()calls with exception handling- Logs failures at DEBUG with
exc_info=Truefor observabilitysrc/ai_company/engine/shutdown.py (4)
249-278: LGTM! Post-cancel exception logging implemented correctly.The
_log_post_cancel_exceptionshelper:
- Retrieves exceptions to prevent asyncio's "Task exception was never retrieved" warning
- Logs non-cancelled tasks with exceptions at DEBUG level (appropriate for shutdown diagnostics)
- Handles
InvalidStateErrorgracefully with task name contextThis addresses the past review concern about silently swallowing exceptions.
375-392: LGTM! Unix signal handler with resilient fallback.The handler correctly:
- Logs the signal reception
- Attempts
request_shutdown()- Falls back to
loop.stop()with suppressed exceptions as a last resort to ensure the process can exitThe
contextlib.suppress(Exception)on line 391 is appropriate here — it's the absolute last-resort fallback to prevent a stuck process.
394-441: LGTM! Windows signal handler with comprehensive fallback chain.The handler implements a resilient fallback chain:
- Try to schedule via
call_soon_threadsafe(preferred)- If no loop, call
request_shutdown()directly- If that fails, write to stderr (avoids structlog's lock acquisition)
- If stderr fails, silently pass (last resort)
The
except Exception: passon line 440-441 is intentional and acceptable — at this point, all fallback mechanisms have failed, and the process is in an unrecoverable signal-handling state. The# noqa: S110acknowledges this intentional suppression.
306-312: LGTM! Cleanup failure logging uses distinct event constant.The switch to
EXECUTION_SHUTDOWN_CLEANUP_FAILEDfor callback exceptions provides better observability by distinguishing cleanup failures from normal cleanup progress.src/ai_company/tools/sandbox/subprocess_sandbox.py (1)
468-481: No review comment provided. Please provide the review comment within<review_comment>tags along with any relevant context (repository information, files being reviewed, etc.) to proceed with the verification and rewriting task.
| def _get_safe_path_prefixes(self) -> tuple[str, ...]: | ||
| """Return safe PATH prefixes for the current platform. | ||
|
|
||
| Combines built-in platform defaults with any extra prefixes | ||
| from ``SubprocessSandboxConfig.extra_safe_path_prefixes``. | ||
| """ | ||
| defaults: tuple[str, ...] | ||
| if os.name == "nt": | ||
| system_root = os.environ.get("SYSTEMROOT", r"C:\WINDOWS") | ||
| return ( | ||
| defaults = ( | ||
| system_root, | ||
| str(Path(system_root) / "system32"), | ||
| r"C:\Program Files\Git", | ||
| r"C:\Program Files (x86)\Git", | ||
| ) | ||
| return ("/usr/bin", "/usr/local/bin", "/bin", "/usr/sbin", "/sbin") | ||
| else: | ||
| defaults = ("/usr/bin", "/usr/local/bin", "/bin", "/usr/sbin", "/sbin") | ||
| return defaults + self._config.extra_safe_path_prefixes |
There was a problem hiding this comment.
Reject repo-controlled PATH prefixes.
Appending arbitrary absolute extra_safe_path_prefixes weakens the “system binaries only” guarantee. Because _spawn_process() still resolves bare commands through PATH, a prefix under the workspace or another writable directory lets a repo shadow git or python with its own executable. Validate these extras as trusted system locations, not just “absolute”.
As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/tools/sandbox/subprocess_sandbox.py` around lines 193 - 210,
The current _get_safe_path_prefixes appends
SubprocessSandboxConfig.extra_safe_path_prefixes without validating them,
allowing repo-controlled directories to shadow system binaries; update
_get_safe_path_prefixes to only include extras that are absolute, exist, and are
vetted as trusted system locations (e.g., reside under known system roots or
match a maintained whitelist), and reject any paths that are writable by
non-root users or that are inside the workspace or other user-writable
directories; also ensure _spawn_process’s command resolution relies only on the
validated prefixes returned by _get_safe_path_prefixes so untrusted extras are
never used.
…results Apply _CREDENTIAL_RE to stderr fragments to prevent leaking https://user:token@host URLs in timeout error messages and logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| proc = MagicMock() | ||
| # _transport is a string (not a transport object) | ||
| type(proc)._transport = PropertyMock(return_value="not-a-transport") | ||
| # Should not raise | ||
| close_subprocess_transport(proc) |
There was a problem hiding this comment.
type(proc)._transport = PropertyMock(...) mutates the global unittest.mock.MagicMock class (since type(proc) is MagicMock), which can leak into unrelated tests and even make later assignments like mock._transport = None raise AttributeError. Use a dedicated dummy class/instance with a property, or patch the attribute via patch.object(..., new_callable=PropertyMock) in a context manager so it is restored after the test.
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
_process_cleanup.py) — eliminates duplication between_git_base.pyandsubprocess_sandbox.py, with guarded exception handling to never mask primary resultsfield_validatorforextra_safe_path_prefixesconfig — reject empty strings and relative paths at the Pydantic validation boundary_log_post_cancel_exceptions()to reduce complexity, log post-cancel task exceptions at DEBUG instead of silently discarding, addstderr.writefallback for Windows signal handler when no event loop is availablekillpgOSError in sandbox kill fallback instead of silently suppressinglist_directory.pyabouttotalcount semantics_process_cleanup.pyTest plan
test_timeout_includes_stderr_fragment,test_timeout_sanitizes_stderr_control_charsTestExtraSafePathPrefixes(5 tests — inclusion, filtering, defaults, empty string rejection, relative path rejection)Review coverage
Pre-reviewed by 10 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency. 15 findings triaged and addressed (11 from PR changes + 4 pre-existing).
🤖 Generated with Claude Code