Skip to content

feat(tools): wire per-category sandbox backend selection#534

Merged
Aureliolo merged 7 commits intomainfrom
feat/sandbox-backend-wiring
Mar 18, 2026
Merged

feat(tools): wire per-category sandbox backend selection#534
Aureliolo merged 7 commits intomainfrom
feat/sandbox-backend-wiring

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Add sandbox backend factory (sandbox/factory.py) with three functions: build_sandbox_backends (instantiates only needed backends), resolve_sandbox_for_category (per-category backend lookup), cleanup_sandbox_backends (parallel cleanup via TaskGroup with error isolation)
  • Wire per-category sandbox resolution into build_default_tools_from_config with 3-tier priority: explicit sandbox > explicit sandbox_backends map > auto-build from config.sandboxing
  • Currently wires VERSION_CONTROL category (git tools); other categories will be wired as their tool builders are added
  • Add 3 sandbox factory event constants, update __init__.py exports, update CLAUDE.md Package Structure

Closes #265

Test plan

  • Unit tests for build_sandbox_backends (7 tests: subprocess-only, docker-only, mixed, sub-configs, deduplication, MappingProxyType return)
  • Unit tests for resolve_sandbox_for_category (4 tests: override, default fallback, all ToolCategory parametrized, KeyError on missing backend)
  • Unit tests for cleanup_sandbox_backends (4 tests: all backends, empty, single, error isolation)
  • Unit tests for factory wiring (8 tests: default/docker config, explicit sandbox precedence, both params, backends map, auto-build, FS unaffected, tool count)
  • Integration tests (4 tests: YAML default, docker default, per-category override, subprocess config propagation)
  • Existing factory tests pass (all 48 factory-related tests green)
  • Full suite: 9111 passed, 94.48% coverage
  • Lint (ruff), format (ruff), type-check (mypy strict) all clean
  • Pre-reviewed by 11 agents, 12 findings addressed

Review coverage

Agent Findings
code-reviewer 2
python-reviewer 4
conventions-enforcer 3
logging-audit 0
resilience-audit 0
pr-test-analyzer 5
type-design-analyzer 7
async-concurrency-reviewer 3
test-quality-reviewer 8
docs-consistency 1
issue-resolution-verifier 0

…ctory

Bridge the config-to-runtime gap for SandboxingConfig by adding a
sandbox backend factory that builds only needed backends, resolves
the correct backend per tool category, and wires it into
build_default_tools_from_config with backward-compatible fallback.

Closes #265
Pre-reviewed by 11 agents, 12 findings addressed:
- cleanup_sandbox_backends: error isolation + TaskGroup for parallel cleanup
- resolve_sandbox_for_category: descriptive KeyError on missing backend
- build_sandbox_backends: return MappingProxyType for immutability
- DockerSandbox.get_backend_type: use NotBlankStr for consistency
- CLAUDE.md: update Package Structure with sandbox factory
- Tests: remove redundant @pytest.mark.asyncio, add guard assertions,
  fix test_tool_count_unchanged to test both code paths, add missing
  error-path tests
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 17, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Per-category sandbox backend selection so tool categories can use Docker or subprocess sandboxes.
  • Observability
    • Added sandbox lifecycle events for built, build-failed, resolve, resolve-failed, cleanup, and cleanup-failed to improve monitoring.
  • Documentation
    • Updated architecture and operations docs to describe per-category sandboxing and wiring.
  • Tests
    • Added unit and integration tests covering sandbox building, resolution, cleanup, and wiring.

Walkthrough

Adds a sandbox backend factory (build, resolve, cleanup), wires per-category sandbox resolution into the tools factory, exposes the sandbox API, adds sandbox observability event constants, and includes unit/integration tests plus documentation updates.

Changes

Cohort / File(s) Summary
Sandbox Backend Factory
src/synthorg/tools/sandbox/factory.py
New module: build_sandbox_backends, resolve_sandbox_for_category, cleanup_sandbox_backends — builds only referenced backends from SandboxingConfig, resolves by ToolCategory, and performs async parallel cleanup with per-backend error handling.
Factory Integration
src/synthorg/tools/factory.py
build_default_tools_from_config signature gains `sandbox_backends: Mapping[str, SandboxBackend]
Public API Exports
src/synthorg/tools/sandbox/__init__.py
Exports new sandbox factory functions (build_sandbox_backends, resolve_sandbox_for_category, cleanup_sandbox_backends) and updates __all__.
Observability Events
src/synthorg/observability/events/sandbox.py
Adds six new event constants: SANDBOX_FACTORY_BUILT, SANDBOX_FACTORY_BUILD_FAILED, SANDBOX_FACTORY_RESOLVE, SANDBOX_FACTORY_RESOLVE_FAILED, SANDBOX_FACTORY_CLEANUP, SANDBOX_FACTORY_CLEANUP_FAILED.
Docker Sandbox Typing
src/synthorg/tools/sandbox/docker_sandbox.py
Moves NotBlankStr import to module scope and returns NotBlankStr("docker") from get_backend_type().
Tests — Unit & Integration
tests/unit/tools/sandbox/test_sandbox_factory.py, tests/unit/tools/test_factory_sandbox_wiring.py, tests/integration/tools/test_sandbox_wiring_integration.py
New unit and integration tests covering backend building, per-category resolution and precedence, error cases, async cleanup behavior, and factory wiring; extensive use of mocks and async tests.
Docs & Notes
docs/architecture/tech-stack.md, docs/design/operations.md, CLAUDE.md
Documentation updated to describe per-category sandbox backend selection, the sandbox factory functions, and wiring choices.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/Engine
    participant Factory as Tools Factory
    participant SandboxFactory as Sandbox Factory
    participant Config as SandboxingConfig
    participant Backends as Backend Instances
    participant Tool as Tool Instance

    Client->>Factory: build_default_tools_from_config(config, sandbox_backends=None)
    Factory->>SandboxFactory: build_sandbox_backends(config, workspace)
    SandboxFactory->>Config: read default_backend and overrides
    SandboxFactory->>Backends: instantiate required backends
    SandboxFactory-->>Factory: return Mapping[str, SandboxBackend]

    loop For each tool needing sandbox
        Factory->>SandboxFactory: resolve_sandbox_for_category(config, backends, ToolCategory)
        SandboxFactory->>Config: get_backend_for_category(category.value)
        SandboxFactory->>Backends: lookup backend by name
        SandboxFactory-->>Factory: return SandboxBackend
        Factory->>Tool: instantiate tool with resolved sandbox
    end

    Factory-->>Client: return tools with sandboxes wired
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #265 — feat: wire engine-level per-category sandbox backend selection from config: This PR implements engine-side sandbox backend construction and per-category wiring described by the issue.

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature: per-category sandbox backend selection wiring.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining factory functions, wiring logic, test coverage, and closing issue #265.
Linked Issues check ✅ Passed The PR fully addresses the objectives from issue #265: implements config-driven per-category backend selection with factory functions, resolves backends by category with fallback logic, and wires VERSION_CONTROL category.
Out of Scope Changes check ✅ Passed All changes are in-scope: factory implementation, event constants, wiring into build_default_tools_from_config, tests, and documentation updates—all directly support per-category sandbox backend selection.
Docstring Coverage ✅ Passed Docstring coverage is 97.30% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sandbox-backend-wiring
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/sandbox-backend-wiring
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the sandboxing capabilities by introducing a dedicated factory for managing sandbox backends. This allows for more granular control, enabling different tool categories to utilize specific sandbox implementations based on configuration. The change improves flexibility and resource management by only instantiating required backends and providing robust cleanup, laying the groundwork for more sophisticated tool isolation strategies.

Highlights

  • New Sandbox Backend Factory: Introduced a new sandbox backend factory (sandbox/factory.py) with functions to build only necessary backends, resolve specific backends per tool category, and clean up active sandboxes in parallel with error isolation.
  • Per-Category Sandbox Resolution: Wired the new per-category sandbox resolution logic into build_default_tools_from_config, establishing a 3-tier priority: explicit sandbox parameter, explicit sandbox_backends map, or auto-building from config.sandboxing.
  • Initial Wiring for Version Control Tools: The VERSION_CONTROL tool category (git tools) has been initially wired to utilize this new per-category sandbox selection mechanism, with other categories planned for future integration.
  • New Observability Events: Added three new event constants (SANDBOX_FACTORY_BUILT, SANDBOX_FACTORY_RESOLVE, SANDBOX_FACTORY_CLEANUP) to src/synthorg/observability/events/sandbox.py for tracking sandbox factory operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 17, 2026 23:42 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sandbox backend factory for per-category backend selection, a significant improvement for configurability. The implementation is high-quality, with robust logic, good use of modern asyncio features like TaskGroup, and exceptionally thorough unit and integration tests. I have one suggestion to improve the extensibility of the new factory.

Comment on lines +55 to +67
backends: dict[str, SandboxBackend] = {}

if "subprocess" in needed:
backends["subprocess"] = SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)

if "docker" in needed:
backends["docker"] = DockerSandbox(
config=config.docker,
workspace=workspace,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation uses a series of if statements to build the backend instances. This pattern can become cumbersome to maintain as more backend types are added. A more extensible approach would be to use a registry pattern (a dictionary mapping backend names to their constructor classes). This would centralize the backend registration and make the building logic more generic and concise. The suggested implementation uses a dictionary comprehension, assuming that the attribute name in SandboxingConfig for a backend's specific configuration matches the backend's name (e.g., config.docker for the 'docker' backend), which is the current convention.

Suggested change
backends: dict[str, SandboxBackend] = {}
if "subprocess" in needed:
backends["subprocess"] = SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)
if "docker" in needed:
backends["docker"] = DockerSandbox(
config=config.docker,
workspace=workspace,
)
backends: dict[str, SandboxBackend] = {
name: builder(config=getattr(config, name), workspace=workspace)
for name, builder in {
"subprocess": SubprocessSandbox,
"docker": DockerSandbox,
}.items()
if name in needed
}

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 82.35294% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.16%. Comparing base (0987dd1) to head (a31ff5b).
⚠️ Report is 4 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/tools/sandbox/factory.py 77.35% 9 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #534      +/-   ##
==========================================
- Coverage   93.19%   93.16%   -0.03%     
==========================================
  Files         522      523       +1     
  Lines       25247    25314      +67     
  Branches     2396     2402       +6     
==========================================
+ Hits        23530    23585      +55     
- Misses       1360     1369       +9     
- Partials      357      360       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/tools/sandbox/docker_sandbox.py`:
- Around line 645-647: The get_backend_type function currently constructs
NotBlankStr("docker") incorrectly (NotBlankStr is a type
alias/Annotated[str,...]); change it to return a plain string instead — replace
NotBlankStr("docker") with "docker" and annotate the return to satisfy typing
(e.g. add a type ignore comment like `# type: ignore[return-value]` or use
typing.cast(NotBlankStr, "docker")) so get_backend_type returns a plain str
while preserving the NotBlankStr return annotation.

In `@tests/unit/tools/sandbox/test_sandbox_factory.py`:
- Around line 22-23: Add a 30-second pytest timeout marker to the test classes
that currently only have `@pytest.mark.unit` so each test gets an explicit cap;
specifically, add `@pytest.mark.timeout`(30) stacked with the existing
`@pytest.mark.unit` above the TestBuildSandboxBackends class (and the other two
class declarations referenced at the same spots in the file) so the decorators
for those classes include both markers.
- Around line 224-239: Update the test_raises_key_error_when_backend_missing
assertion to verify the KeyError message contains descriptive diagnostics: when
calling resolve_sandbox_for_category with config (SandboxingConfig) and category
ToolCategory.VERSION_CONTROL and backends map, assert the raised KeyError
message includes both the missing backend name ("docker" or config.overrides
value), the category identifier (e.g., "version_control" or
ToolCategory.VERSION_CONTROL), and the list of available backends (from the
backends keys) so the match is more specific and will fail if the descriptive
message regresses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4c4eef4c-9d30-4c31-9aed-0ed66543f6e2

📥 Commits

Reviewing files that changed from the base of the PR and between 0987dd1 and 64600a7.

📒 Files selected for processing (9)
  • CLAUDE.md
  • src/synthorg/observability/events/sandbox.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/sandbox/__init__.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/tools/sandbox/factory.py
  • tests/integration/tools/test_sandbox_wiring_integration.py
  • tests/unit/tools/sandbox/test_sandbox_factory.py
  • tests/unit/tools/test_factory_sandbox_wiring.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Deploy Preview
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True, ConfigDict) for config and identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions: @computed_field for derived values instead of storing + validating redundant fields (e.g., TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...

Files:

  • src/synthorg/observability/events/sandbox.py
  • src/synthorg/tools/sandbox/factory.py
  • src/synthorg/tools/sandbox/__init__.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/tools/factory.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use cd in Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly

Files:

  • src/synthorg/observability/events/sandbox.py
  • tests/integration/tools/test_sandbox_wiring_integration.py
  • src/synthorg/tools/sandbox/factory.py
  • src/synthorg/tools/sandbox/__init__.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • tests/unit/tools/sandbox/test_sandbox_factory.py
  • src/synthorg/tools/factory.py
  • tests/unit/tools/test_factory_sandbox_wiring.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Maintain 80% minimum code coverage (enforced in CI)
Use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed
Set 30 second timeout per test
ALWAYS include -n auto when running pytest with pytest-xdist, never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests. Use generic names: test-provider, test-small-001, etc. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names
Use Hypothesis for property-based testing with @given + @settings. Control via HYPOTHESIS_PROFILE env var: ci (200 examples, default) or dev (1000 examples). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins

Files:

  • tests/integration/tools/test_sandbox_wiring_integration.py
  • tests/unit/tools/sandbox/test_sandbox_factory.py
  • tests/unit/tools/test_factory_sandbox_wiring.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

For Markdown documentation, always read the relevant docs/design/ page when implementing features to ensure alignment with design specifications

Files:

  • CLAUDE.md
🧠 Learnings (3)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/sandbox.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators

Applied to files:

  • src/synthorg/tools/sandbox/docker_sandbox.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement

Applied to files:

  • src/synthorg/tools/factory.py
🧬 Code graph analysis (5)
tests/integration/tools/test_sandbox_wiring_integration.py (2)
src/synthorg/config/loader.py (1)
  • load_config_from_string (495-524)
src/synthorg/tools/factory.py (1)
  • build_default_tools_from_config (136-202)
src/synthorg/tools/sandbox/factory.py (2)
src/synthorg/tools/sandbox/docker_sandbox.py (4)
  • DockerSandbox (70-647)
  • config (113-115)
  • workspace (118-120)
  • cleanup (601-620)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
tests/unit/tools/sandbox/test_sandbox_factory.py (2)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
src/synthorg/tools/sandbox/factory.py (3)
  • build_sandbox_backends (33-75)
  • cleanup_sandbox_backends (126-153)
  • resolve_sandbox_for_category (78-123)
src/synthorg/tools/factory.py (2)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
src/synthorg/tools/sandbox/factory.py (2)
  • build_sandbox_backends (33-75)
  • resolve_sandbox_for_category (78-123)
tests/unit/tools/test_factory_sandbox_wiring.py (3)
src/synthorg/tools/sandbox/docker_sandbox.py (2)
  • config (113-115)
  • workspace (118-120)
src/synthorg/config/schema.py (1)
  • RootConfig (387-651)
src/synthorg/tools/factory.py (1)
  • build_default_tools_from_config (136-202)
🔇 Additional comments (7)
src/synthorg/observability/events/sandbox.py (1)

17-19: Looks good — new sandbox factory events are consistent.

The added SANDBOX_FACTORY_* constants follow the existing event naming scheme and fit this domain module cleanly.

CLAUDE.md (1)

136-136: Documentation update is accurate and useful.

The package structure entry clearly documents the new sandbox factory surface and responsibilities.

src/synthorg/tools/sandbox/__init__.py (1)

7-11: Public sandbox API export wiring is clean.

Imports and __all__ were updated consistently, so the new factory helpers are exposed as intended.

Also applies to: 28-30

tests/integration/tools/test_sandbox_wiring_integration.py (1)

14-135: Integration coverage for sandbox wiring is solid.

The test set exercises default, override, and config-propagation paths through load_config_from_string + build_default_tools_from_config effectively.

src/synthorg/tools/factory.py (1)

136-202: Per-category sandbox resolution wiring is correctly prioritized.

The fallback order and VERSION_CONTROL resolution path are clear and consistent with the stated behavior.

src/synthorg/tools/sandbox/factory.py (1)

33-75: Factory implementation is well-structured and aligned with the objective.

build, resolve, and cleanup responsibilities are clearly separated, with immutable backend mapping and structured parallel cleanup.

Also applies to: 78-123, 126-153

tests/unit/tools/test_factory_sandbox_wiring.py (1)

49-228: Unit test coverage for sandbox wiring precedence is comprehensive.

The cases validate explicit override behavior, provided backend-map behavior, auto-build behavior, and unaffected tool categories.

Comment on lines 645 to +647
def get_backend_type(self) -> NotBlankStr:
"""Return ``'docker'``."""
return "docker"
return NotBlankStr("docker")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== NotBlankStr definition =="
rg -n -C2 '^\s*NotBlankStr\s*=' src/synthorg/core/types.py

echo
echo "== get_backend_type implementations =="
rg -n -C2 '\bdef get_backend_type\s*\(' src/synthorg/tools/sandbox

echo
echo "== Constructor-style NotBlankStr usage =="
rg -n '\bNotBlankStr\s*\(' src/synthorg

Repository: Aureliolo/synthorg

Length of output: 9129


Return a plain string instead of calling NotBlankStr as a constructor.

NotBlankStr is a type alias (Annotated[str, ...]), not a callable type. While NotBlankStr("docker") won't fail at runtime, it's incorrect usage. Instead, return the plain string:

def get_backend_type(self) -> NotBlankStr:
    """Return ``'docker'``."""
    return "docker"  # type: ignore[return-value]

Or use cast for clarity:

from typing import cast
return cast(NotBlankStr, "docker")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/sandbox/docker_sandbox.py` around lines 645 - 647, The
get_backend_type function currently constructs NotBlankStr("docker") incorrectly
(NotBlankStr is a type alias/Annotated[str,...]); change it to return a plain
string instead — replace NotBlankStr("docker") with "docker" and annotate the
return to satisfy typing (e.g. add a type ignore comment like `# type:
ignore[return-value]` or use typing.cast(NotBlankStr, "docker")) so
get_backend_type returns a plain str while preserving the NotBlankStr return
annotation.

Comment on lines +22 to +23
@pytest.mark.unit
class TestBuildSandboxBackends:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a 30s timeout marker for these test classes

These classes currently only carry @pytest.mark.unit. Add a timeout marker so each contained test has an explicit 30-second cap.

Suggested class-level timeout markers
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestBuildSandboxBackends:
@@
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestResolveSandboxForCategory:
@@
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestCleanupSandboxBackends:

As per coding guidelines, "Set 30 second timeout per test".

Also applies to: 158-159, 242-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/sandbox/test_sandbox_factory.py` around lines 22 - 23, Add a
30-second pytest timeout marker to the test classes that currently only have
`@pytest.mark.unit` so each test gets an explicit cap; specifically, add
`@pytest.mark.timeout`(30) stacked with the existing `@pytest.mark.unit` above the
TestBuildSandboxBackends class (and the other two class declarations referenced
at the same spots in the file) so the decorators for those classes include both
markers.

Comment on lines +224 to +239
def test_raises_key_error_when_backend_missing(self) -> None:
"""KeyError with descriptive message when backend not in map."""
config = SandboxingConfig(
default_backend="subprocess",
overrides={"version_control": "docker"},
)
backends: Mapping[str, SandboxBackend] = {
"subprocess": MagicMock(spec=SandboxBackend),
}

with pytest.raises(KeyError, match="docker"):
resolve_sandbox_for_category(
config=config,
backends=backends,
category=ToolCategory.VERSION_CONTROL,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Strengthen the KeyError assertion to lock in descriptive diagnostics

match="docker" is too broad and would still pass if the message regresses to a bare backend name. Assert key message fragments (category + available backends) to protect the intended behavior.

Suggested test assertion hardening
-        with pytest.raises(KeyError, match="docker"):
+        with pytest.raises(
+            KeyError,
+            match=r"resolved for category 'version_control'.*available: \['subprocess'\]",
+        ):
             resolve_sandbox_for_category(
                 config=config,
                 backends=backends,
                 category=ToolCategory.VERSION_CONTROL,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_raises_key_error_when_backend_missing(self) -> None:
"""KeyError with descriptive message when backend not in map."""
config = SandboxingConfig(
default_backend="subprocess",
overrides={"version_control": "docker"},
)
backends: Mapping[str, SandboxBackend] = {
"subprocess": MagicMock(spec=SandboxBackend),
}
with pytest.raises(KeyError, match="docker"):
resolve_sandbox_for_category(
config=config,
backends=backends,
category=ToolCategory.VERSION_CONTROL,
)
def test_raises_key_error_when_backend_missing(self) -> None:
"""KeyError with descriptive message when backend not in map."""
config = SandboxingConfig(
default_backend="subprocess",
overrides={"version_control": "docker"},
)
backends: Mapping[str, SandboxBackend] = {
"subprocess": MagicMock(spec=SandboxBackend),
}
with pytest.raises(
KeyError,
match=r"resolved for category 'version_control'.*available: \['subprocess'\]",
):
resolve_sandbox_for_category(
config=config,
backends=backends,
category=ToolCategory.VERSION_CONTROL,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/sandbox/test_sandbox_factory.py` around lines 224 - 239,
Update the test_raises_key_error_when_backend_missing assertion to verify the
KeyError message contains descriptive diagnostics: when calling
resolve_sandbox_for_category with config (SandboxingConfig) and category
ToolCategory.VERSION_CONTROL and backends map, assert the raised KeyError
message includes both the missing backend name ("docker" or config.overrides
value), the category identifier (e.g., "version_control" or
ToolCategory.VERSION_CONTROL), and the list of available backends (from the
backends keys) so the match is more specific and will fail if the descriptive
message regresses.

…abbit

- Replace TaskGroup with gather(return_exceptions=True) in
  cleanup_sandbox_backends for shutdown-resilient error isolation
- Make cleanup_sandbox_backends keyword-only for API consistency
- Add SANDBOX_FACTORY_RESOLVE_FAILED and SANDBOX_FACTORY_CLEANUP_FAILED
  event constants for distinct success/error log filtering
- Extract _resolve_vc_sandbox helper to keep build_default_tools_from_config
  under 50-line limit
- Add KeyError to Raises section of build_default_tools_from_config docstring
- Change `from None` to `from exc` in resolve_sandbox_for_category
  to preserve exception chain
- Add unknown backend name validation in build_sandbox_backends
- Add cleanup logging assertion in test_cleanup_continues_on_error
- Strengthen KeyError assertion to lock in diagnostic message content
- Update docs/architecture/tech-stack.md: DockerSandbox no longer "planned"
- Update docs/design/operations.md: document sandbox factory implementation
- Add explanatory comment to type: ignore in test helper
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/unit/tools/sandbox/test_sandbox_factory.py (1)

23-24: ⚠️ Potential issue | 🟡 Minor

Timeout markers are still missing on the unit test classes.

Please add @pytest.mark.timeout(30) to each test class.

✅ Suggested change
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestBuildSandboxBackends:
@@
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestResolveSandboxForCategory:
@@
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestCleanupSandboxBackends:

As per coding guidelines, "Set 30 second timeout per test".

Also applies to: 159-160, 249-250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/sandbox/test_sandbox_factory.py` around lines 23 - 24, The
test classes lack the required timeout decorator; add `@pytest.mark.timeout`(30)
above each unit test class definition (e.g., add it above class
TestBuildSandboxBackends and the other test classes referenced) so every test
class has a 30-second timeout; ensure you import pytest if not already present
and place the decorator directly above the class declaration to apply the
timeout to all contained tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/tools/factory.py`:
- Around line 146-155: The current code eagerly calls build_sandbox_backends
when sandbox_backends is None, which constructs all configured backends
(including Docker) even though only ToolCategory.VERSION_CONTROL is needed;
change the flow to only build or resolve the backends required for
VERSION_CONTROL: either pass sandbox_backends=None into
resolve_sandbox_for_category so it lazily constructs only the requested backend,
or update/build a new helper (e.g., build_sandbox_backends_for_category or
extend build_sandbox_backends with a categories/categories_allowed parameter)
and call that with category=ToolCategory.VERSION_CONTROL (keeping
config=config.sandboxing and workspace=workspace) before calling
resolve_sandbox_for_category; update references to sandbox_backends,
build_sandbox_backends, resolve_sandbox_for_category, and
ToolCategory.VERSION_CONTROL accordingly.

In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 171-174: Replace the use of asyncio.gather for the fan-out/fan-in
cleanup with structured concurrency using asyncio.TaskGroup: inside the async
function where backend_items and _cleanup_one are used, create an
asyncio.TaskGroup, start tasks for each (_cleanup_one(n, b)) via
tg.create_task(...) (or assign to a locals list if you need results), await the
TaskGroup context so exceptions propagate in a structured way, and collect
results or handle exceptions after the with asyncio.TaskGroup() as tg block;
update any code that expects the previous results variable accordingly and
ensure return_exceptions semantics are handled explicitly (e.g., try/except per
task) if you need to preserve them.

In `@tests/unit/tools/test_factory_sandbox_wiring.py`:
- Around line 49-50: Add an explicit 30s timeout marker to the test class by
stacking `@pytest.mark.timeout`(30) with the existing `@pytest.mark.unit` on the
TestFactorySandboxWiring class; update the decorator list above class
TestFactorySandboxWiring so both `@pytest.mark.unit` and `@pytest.mark.timeout`(30)
appear (order doesn't matter) to enforce the 30-second timeout per test.

---

Duplicate comments:
In `@tests/unit/tools/sandbox/test_sandbox_factory.py`:
- Around line 23-24: The test classes lack the required timeout decorator; add
`@pytest.mark.timeout`(30) above each unit test class definition (e.g., add it
above class TestBuildSandboxBackends and the other test classes referenced) so
every test class has a 30-second timeout; ensure you import pytest if not
already present and place the decorator directly above the class declaration to
apply the timeout to all contained tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 685192b9-459e-4520-8fdc-7df7305c112a

📥 Commits

Reviewing files that changed from the base of the PR and between 64600a7 and f6fe155.

📒 Files selected for processing (7)
  • docs/architecture/tech-stack.md
  • docs/design/operations.md
  • src/synthorg/observability/events/sandbox.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/sandbox/factory.py
  • tests/unit/tools/sandbox/test_sandbox_factory.py
  • tests/unit/tools/test_factory_sandbox_wiring.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

For Markdown documentation, always read the relevant docs/design/ page when implementing features to ensure alignment with design specifications

Files:

  • docs/design/operations.md
  • docs/architecture/tech-stack.md
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True, ConfigDict) for config and identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions: @computed_field for derived values instead of storing + validating redundant fields (e.g., TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...

Files:

  • src/synthorg/observability/events/sandbox.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/sandbox/factory.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use cd in Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly

Files:

  • src/synthorg/observability/events/sandbox.py
  • tests/unit/tools/test_factory_sandbox_wiring.py
  • tests/unit/tools/sandbox/test_sandbox_factory.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/sandbox/factory.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Maintain 80% minimum code coverage (enforced in CI)
Use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed
Set 30 second timeout per test
ALWAYS include -n auto when running pytest with pytest-xdist, never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests. Use generic names: test-provider, test-small-001, etc. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names
Use Hypothesis for property-based testing with @given + @settings. Control via HYPOTHESIS_PROFILE env var: ci (200 examples, default) or dev (1000 examples). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins

Files:

  • tests/unit/tools/test_factory_sandbox_wiring.py
  • tests/unit/tools/sandbox/test_sandbox_factory.py
🧠 Learnings (9)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/sandbox.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to tests/**/*.py : Set 30 second timeout per test

Applied to files:

  • tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to tests/**/*.py : Use pytest markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`

Applied to files:

  • tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement

Applied to files:

  • src/synthorg/tools/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • docs/architecture/tech-stack.md
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`

Applied to files:

  • docs/architecture/tech-stack.md
🧬 Code graph analysis (2)
tests/unit/tools/test_factory_sandbox_wiring.py (2)
src/synthorg/config/schema.py (1)
  • RootConfig (387-651)
src/synthorg/tools/factory.py (1)
  • build_default_tools_from_config (158-219)
src/synthorg/tools/factory.py (2)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
src/synthorg/tools/sandbox/factory.py (2)
  • build_sandbox_backends (37-92)
  • resolve_sandbox_for_category (95-140)
🔇 Additional comments (3)
src/synthorg/observability/events/sandbox.py (1)

17-21: Event constants are well-scoped and consistent.

These additions follow the existing sandbox event naming scheme and are ready to use in structured logging paths.

docs/design/operations.md (1)

466-471: Documentation update is clear and accurately scoped.

This captures the factory responsibilities and current category wiring status without over-claiming.

docs/architecture/tech-stack.md (1)

114-114: Tech-stack convention row is aligned with the new sandbox wiring model.

Comment on lines +146 to +155
if sandbox_backends is None:
sandbox_backends = build_sandbox_backends(
config=config.sandboxing,
workspace=workspace,
)
return resolve_sandbox_for_category(
config=config.sandboxing,
backends=sandbox_backends,
category=ToolCategory.VERSION_CONTROL,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid instantiating unrelated backends in the VERSION_CONTROL-only path.

When sandbox_backends is None, Line 147 builds all configured backends (default + every override), but this code path only consumes ToolCategory.VERSION_CONTROL. That can eagerly construct unused backends (notably Docker), adding avoidable startup failures/dependencies for git-only tool builds.

♻️ Proposed fix
 def _resolve_vc_sandbox(
     *,
     config: RootConfig,
     sandbox_backends: Mapping[str, SandboxBackend] | None,
     workspace: Path,
 ) -> SandboxBackend:
@@
-    if sandbox_backends is None:
-        sandbox_backends = build_sandbox_backends(
-            config=config.sandboxing,
-            workspace=workspace,
-        )
+    if sandbox_backends is None:
+        vc_backend_name = config.sandboxing.backend_for_category(
+            ToolCategory.VERSION_CONTROL.value,
+        )
+        vc_only_config = config.sandboxing.model_copy(
+            update={
+                "default_backend": vc_backend_name,
+                "overrides": {},
+            },
+        )
+        sandbox_backends = build_sandbox_backends(
+            config=vc_only_config,
+            workspace=workspace,
+        )

As per coding guidelines, "If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/factory.py` around lines 146 - 155, The current code
eagerly calls build_sandbox_backends when sandbox_backends is None, which
constructs all configured backends (including Docker) even though only
ToolCategory.VERSION_CONTROL is needed; change the flow to only build or resolve
the backends required for VERSION_CONTROL: either pass sandbox_backends=None
into resolve_sandbox_for_category so it lazily constructs only the requested
backend, or update/build a new helper (e.g., build_sandbox_backends_for_category
or extend build_sandbox_backends with a categories/categories_allowed parameter)
and call that with category=ToolCategory.VERSION_CONTROL (keeping
config=config.sandboxing and workspace=workspace) before calling
resolve_sandbox_for_category; update references to sandbox_backends,
build_sandbox_backends, resolve_sandbox_for_category, and
ToolCategory.VERSION_CONTROL accordingly.

Comment on lines +171 to +174
results = await asyncio.gather(
*(_cleanup_one(n, b) for n, b in backend_items),
return_exceptions=True,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use TaskGroup here (or explicitly document the exception).

Line 171 currently uses asyncio.gather(...) for a new fan-out/fan-in path. Project convention prefers structured concurrency with asyncio.TaskGroup for this pattern.

As per coding guidelines, "Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code... Prefer structured concurrency over bare create_task".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/sandbox/factory.py` around lines 171 - 174, Replace the
use of asyncio.gather for the fan-out/fan-in cleanup with structured concurrency
using asyncio.TaskGroup: inside the async function where backend_items and
_cleanup_one are used, create an asyncio.TaskGroup, start tasks for each
(_cleanup_one(n, b)) via tg.create_task(...) (or assign to a locals list if you
need results), await the TaskGroup context so exceptions propagate in a
structured way, and collect results or handle exceptions after the with
asyncio.TaskGroup() as tg block; update any code that expects the previous
results variable accordingly and ensure return_exceptions semantics are handled
explicitly (e.g., try/except per task) if you need to preserve them.

Comment on lines +49 to +50
@pytest.mark.unit
class TestFactorySandboxWiring:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add an explicit 30s timeout marker for this test class.

Please stack @pytest.mark.timeout(30) with the existing unit marker.

✅ Suggested change
 `@pytest.mark.unit`
+@pytest.mark.timeout(30)
 class TestFactorySandboxWiring:

As per coding guidelines, "Set 30 second timeout per test".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/test_factory_sandbox_wiring.py` around lines 49 - 50, Add an
explicit 30s timeout marker to the test class by stacking
`@pytest.mark.timeout`(30) with the existing `@pytest.mark.unit` on the
TestFactorySandboxWiring class; update the decorator list above class
TestFactorySandboxWiring so both `@pytest.mark.unit` and `@pytest.mark.timeout`(30)
appear (order doesn't matter) to enforce the 30-second timeout per test.

Add comment explaining why cleanup_sandbox_backends uses
asyncio.gather(return_exceptions=True) instead of TaskGroup:
TaskGroup cancels sibling tasks on BaseException, defeating
the error-isolation guarantee during shutdown.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 69-70: The current error path logs the success event
SANDBOX_FACTORY_BUILT which skews telemetry; change the logger.error call in the
error branch that now uses SANDBOX_FACTORY_BUILT to emit a failure-specific
event (e.g., SANDBOX_FACTORY_BUILD_FAILED) or include an explicit failure status
field on the emitted event so telemetry can filter it; update the event constant
declaration (and any exports) and replace the reference in the failing branch
that calls logger.error(SANDBOX_FACTORY_BUILT, error=msg) so it uses the new
failure constant (or adds status="failed") before raising ValueError(msg).
- Around line 74-84: The factory currently instantiates SubprocessSandbox and
DockerSandbox directly so if their constructors raise the exception is
propagated without backend-specific context; wrap each constructor call (the
SubprocessSandbox call that uses config.subprocess and workspace, and the
DockerSandbox call that uses config.docker and workspace) in a try/except that
logs a structured ERROR (include backend name and relevant config reference)
before re-raising the exception, then assign to backends["subprocess"] /
backends["docker"] as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b14f7f0b-8711-4da4-bfd0-0b6c6bcf13a2

📥 Commits

Reviewing files that changed from the base of the PR and between f6fe155 and c3d335e.

📒 Files selected for processing (1)
  • src/synthorg/tools/sandbox/factory.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True, ConfigDict) for config and identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions: @computed_field for derived values instead of storing + validating redundant fields (e.g., TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...

Files:

  • src/synthorg/tools/sandbox/factory.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use cd in Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly

Files:

  • src/synthorg/tools/sandbox/factory.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
🧬 Code graph analysis (1)
src/synthorg/tools/sandbox/factory.py (2)
tests/unit/tools/sandbox/conftest.py (1)
  • subprocess_sandbox (24-32)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
🔇 Additional comments (2)
src/synthorg/tools/sandbox/factory.py (2)

95-140: Category resolution path looks solid.

Good contextual warning on missing backend and clear debug logging on successful resolution.


170-188: Cleanup error-isolation behavior is well implemented.

The documented rationale plus per-backend isolation/logging aligns with the best-effort shutdown contract.

- Add SANDBOX_FACTORY_BUILD_FAILED event constant so the error path
  in build_sandbox_backends does not reuse the success event
- Wrap SubprocessSandbox/DockerSandbox constructors in try/except
  to log backend-specific context before re-raising
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/synthorg/tools/sandbox/factory.py (1)

189-198: ⚠️ Potential issue | 🟠 Major

cleanup_sandbox_backends still diverges from the TaskGroup requirement.

Line 195 keeps asyncio.gather(...) in a new fan-out cleanup path. This was previously flagged and is still a divergence from the stated TaskGroup direction for this feature; if intentional, it needs explicit spec approval/update.

🔧 TaskGroup-compatible pattern (while preserving per-backend isolation)
-    backend_items = list(backends.items())
-    results = await asyncio.gather(
-        *(_cleanup_one(n, b) for n, b in backend_items),
-        return_exceptions=True,
-    )
-    # Log BaseException subclasses (CancelledError, KeyboardInterrupt)
-    # that escaped _cleanup_one's except Exception block
-    for (name, _), result in zip(backend_items, results, strict=True):
-        if isinstance(result, BaseException):
-            logger.error(
-                SANDBOX_FACTORY_CLEANUP_FAILED,
-                backend=name,
-                error=f"unhandled exception during cleanup: {result!r}",
-            )
+    async def _cleanup_one_isolated(name: str, backend: SandboxBackend) -> None:
+        try:
+            await _cleanup_one(name, backend)
+        except asyncio.CancelledError:
+            logger.warning(
+                SANDBOX_FACTORY_CLEANUP_FAILED,
+                backend=name,
+                error="cleanup task cancelled",
+                exc_info=True,
+            )
+
+    async with asyncio.TaskGroup() as tg:
+        for name, backend in backends.items():
+            tg.create_task(_cleanup_one_isolated(name, backend))

Use this script to confirm implementation/spec alignment:

#!/bin/bash
rg -n "cleanup_sandbox_backends|asyncio\\.gather|TaskGroup" src/synthorg/tools/sandbox/factory.py
rg -n "TaskGroup|sandboxing|11\\.1\\.2|15\\.5" docs CLAUDE.md || true

As per coding guidelines, "Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code ... Prefer structured concurrency over bare create_task" and "If implementation deviates from the design spec, alert the user and explain why".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/sandbox/factory.py` around lines 189 - 198, The current
cleanup uses asyncio.gather inside cleanup_sandbox_backends which deviates from
the TaskGroup requirement; replace the gather call with an asyncio.TaskGroup and
create a separate task for each backend using tg.create_task, but wrap each call
to _cleanup_one(n, b) in a small try/except wrapper coroutine that catches and
records exceptions (so an exception in one backend does not cancel sibling
tasks); collect results/exceptions into the same results list or a dict keyed by
backend name (use backend_items, _cleanup_one, and cleanup_sandbox_backends to
locate code) and ensure return_exceptions-like behavior by storing exceptions
rather than letting them propagate out of the TaskGroup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 38-111: The build_sandbox_backends function exceeds the 50-line
limit; refactor by extracting validation and instantiation into small helpers:
create a validate_backends(needed: set[str]) function that checks needed against
_KNOWN_BACKENDS and logs/raises using SANDBOX_FACTORY_BUILD_FAILED, and an
instantiate_backend(name: str, config, workspace: Path) -> SandboxBackend helper
that constructs SubprocessSandbox or DockerSandbox, logs errors with
SANDBOX_FACTORY_BUILD_FAILED and re-raises; then have build_sandbox_backends
compute needed, call validate_backends(needed), loop needed to call
instantiate_backend for each required backend, collect into backends, and
finally log SANDBOX_FACTORY_BUILT and return MappingProxyType(backends).

---

Duplicate comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 189-198: The current cleanup uses asyncio.gather inside
cleanup_sandbox_backends which deviates from the TaskGroup requirement; replace
the gather call with an asyncio.TaskGroup and create a separate task for each
backend using tg.create_task, but wrap each call to _cleanup_one(n, b) in a
small try/except wrapper coroutine that catches and records exceptions (so an
exception in one backend does not cancel sibling tasks); collect
results/exceptions into the same results list or a dict keyed by backend name
(use backend_items, _cleanup_one, and cleanup_sandbox_backends to locate code)
and ensure return_exceptions-like behavior by storing exceptions rather than
letting them propagate out of the TaskGroup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6816dbe0-d6db-400d-bcad-d643370c8b19

📥 Commits

Reviewing files that changed from the base of the PR and between c3d335e and 6b09681.

📒 Files selected for processing (2)
  • src/synthorg/observability/events/sandbox.py
  • src/synthorg/tools/sandbox/factory.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True, ConfigDict) for config and identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions: @computed_field for derived values instead of storing + validating redundant fields (e.g., TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...

Files:

  • src/synthorg/tools/sandbox/factory.py
  • src/synthorg/observability/events/sandbox.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use cd in Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly

Files:

  • src/synthorg/tools/sandbox/factory.py
  • src/synthorg/observability/events/sandbox.py
🧠 Learnings (5)
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/sandbox.py
🧬 Code graph analysis (1)
src/synthorg/tools/sandbox/factory.py (2)
src/synthorg/tools/sandbox/docker_sandbox.py (1)
  • cleanup (601-620)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
🔇 Additional comments (2)
src/synthorg/observability/events/sandbox.py (1)

17-22: Good additive sandbox factory event coverage.

These constants cleanly cover build/resolve/cleanup success and failure paths and fit the existing event taxonomy.

src/synthorg/tools/sandbox/factory.py (1)

64-103: Strong error-path observability in backend construction.

Validation and constructor failure branches log structured context before re-raising, which improves startup diagnostics.

Comment on lines +38 to +111
def build_sandbox_backends(
*,
config: SandboxingConfig,
workspace: Path,
) -> MappingProxyType[str, SandboxBackend]:
"""Build only the backend instances actually referenced by *config*.

Collects which backend names are needed (the default plus all
override values), then instantiates ``SubprocessSandbox`` and/or
``DockerSandbox`` with their respective sub-configs.

Args:
config: Top-level sandboxing configuration.
workspace: Absolute path to the agent workspace root.

Returns:
A read-only mapping of backend name to backend instance.
Only contains keys for backends that are actually referenced.

Raises:
ValueError: If *config* references an unrecognized backend
name not in the known set (``subprocess``, ``docker``).
"""
needed: set[str] = {config.default_backend}
needed.update(config.overrides.values())

unknown = needed - _KNOWN_BACKENDS
if unknown:
msg = (
f"Unrecognized sandbox backend name(s): {sorted(unknown)}; "
f"known backends: {sorted(_KNOWN_BACKENDS)}"
)
logger.error(SANDBOX_FACTORY_BUILD_FAILED, error=msg)
raise ValueError(msg)

backends: dict[str, SandboxBackend] = {}

if "subprocess" in needed:
try:
backends["subprocess"] = SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)
except Exception:
logger.error(
SANDBOX_FACTORY_BUILD_FAILED,
backend="subprocess",
workspace=str(workspace),
exc_info=True,
)
raise

if "docker" in needed:
try:
backends["docker"] = DockerSandbox(
config=config.docker,
workspace=workspace,
)
except Exception:
logger.error(
SANDBOX_FACTORY_BUILD_FAILED,
backend="docker",
workspace=str(workspace),
exc_info=True,
)
raise

logger.info(
SANDBOX_FACTORY_BUILT,
backends=sorted(backends.keys()),
default=config.default_backend,
override_count=len(config.overrides),
)
return MappingProxyType(backends)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Refactor build_sandbox_backends to stay within function-size limits.

This function is currently over the project’s 50-line limit; split validation/instantiation into helpers to keep the public function focused.

♻️ Proposed refactor split
+def _collect_needed_backends(config: SandboxingConfig) -> set[str]:
+    needed: set[str] = {config.default_backend}
+    needed.update(config.overrides.values())
+    return needed
+
+def _validate_needed_backends(needed: set[str]) -> None:
+    unknown = needed - _KNOWN_BACKENDS
+    if unknown:
+        msg = (
+            f"Unrecognized sandbox backend name(s): {sorted(unknown)}; "
+            f"known backends: {sorted(_KNOWN_BACKENDS)}"
+        )
+        logger.error(SANDBOX_FACTORY_BUILD_FAILED, error=msg)
+        raise ValueError(msg)
+
 def build_sandbox_backends(
@@
-    needed: set[str] = {config.default_backend}
-    needed.update(config.overrides.values())
-
-    unknown = needed - _KNOWN_BACKENDS
-    if unknown:
-        msg = (
-            f"Unrecognized sandbox backend name(s): {sorted(unknown)}; "
-            f"known backends: {sorted(_KNOWN_BACKENDS)}"
-        )
-        logger.error(SANDBOX_FACTORY_BUILD_FAILED, error=msg)
-        raise ValueError(msg)
+    needed = _collect_needed_backends(config)
+    _validate_needed_backends(needed)
As per coding guidelines, "Keep functions under 50 lines and files under 800 lines".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/sandbox/factory.py` around lines 38 - 111, The
build_sandbox_backends function exceeds the 50-line limit; refactor by
extracting validation and instantiation into small helpers: create a
validate_backends(needed: set[str]) function that checks needed against
_KNOWN_BACKENDS and logs/raises using SANDBOX_FACTORY_BUILD_FAILED, and an
instantiate_backend(name: str, config, workspace: Path) -> SandboxBackend helper
that constructs SubprocessSandbox or DockerSandbox, logs errors with
SANDBOX_FACTORY_BUILD_FAILED and re-raises; then have build_sandbox_backends
compute needed, call validate_backends(needed), loop needed to call
instantiate_backend for each required backend, collect into backends, and
finally log SANDBOX_FACTORY_BUILT and return MappingProxyType(backends).

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 48-56: The dispatch currently treats any non-"subprocess" backend
as Docker, which is fragile; modify the branch that uses the name variable so it
explicitly checks for both "subprocess" and "docker" (e.g., if name ==
"subprocess": return SubprocessSandbox(...); elif name == "docker": return
DockerSandbox(...)) and raise a clear exception (ValueError or custom) for any
other name instead of falling back to Docker; keep using config.subprocess,
config.docker and workspace when constructing the respective SubprocessSandbox
and DockerSandbox instances so callers get deterministic behavior and clear
errors for unsupported backends.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9535369-8af8-4cbe-8a21-2edfe797046c

📥 Commits

Reviewing files that changed from the base of the PR and between 6b09681 and 5aecbfd.

📒 Files selected for processing (1)
  • src/synthorg/tools/sandbox/factory.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Deploy Preview
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True, ConfigDict) for config and identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions: @computed_field for derived values instead of storing + validating redundant fields (e.g., TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...

Files:

  • src/synthorg/tools/sandbox/factory.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use cd in Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly

Files:

  • src/synthorg/tools/sandbox/factory.py
🧠 Learnings (5)
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/synthorg/tools/sandbox/factory.py
🧬 Code graph analysis (1)
src/synthorg/tools/sandbox/factory.py (3)
src/synthorg/tools/sandbox/docker_sandbox.py (4)
  • DockerSandbox (70-647)
  • config (113-115)
  • workspace (118-120)
  • cleanup (601-620)
tests/unit/tools/sandbox/conftest.py (1)
  • subprocess_sandbox (24-32)
src/synthorg/core/enums.py (1)
  • ToolCategory (294-308)
🔇 Additional comments (4)
src/synthorg/tools/sandbox/factory.py (4)

13-35: Observability wiring looks solid.

Using get_logger(__name__) with domain event constants here is consistent and clean.


67-113: build_sandbox_backends implementation matches the objective well.

Validation, selective instantiation, and read-only return via MappingProxyType are all correctly handled.


115-160: Category resolution path is robust and well-instrumented.

The missing-backend error path includes useful context and preserves failure semantics via KeyError.


163-208: Cleanup strategy is appropriately resilient for best-effort shutdown.

The explicit gather(return_exceptions=True) rationale here is clear and preserves per-backend isolation during cleanup.

Comment on lines +48 to +56
if name == "subprocess":
return SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)
return DockerSandbox(
config=config.docker,
workspace=workspace,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make backend dispatch explicit instead of implicit docker fallback.

Any non-"subprocess" value currently routes to DockerSandbox. That’s fragile if this helper is reused without prior validation.

♻️ Proposed defensive dispatch
     try:
         if name == "subprocess":
             return SubprocessSandbox(
                 config=config.subprocess,
                 workspace=workspace,
             )
-        return DockerSandbox(
-            config=config.docker,
-            workspace=workspace,
-        )
+        if name == "docker":
+            return DockerSandbox(
+                config=config.docker,
+                workspace=workspace,
+            )
+        msg = f"Unsupported sandbox backend: {name!r}"
+        logger.error(SANDBOX_FACTORY_BUILD_FAILED, backend=name, error=msg)
+        raise ValueError(msg)
     except Exception:
         logger.error(
             SANDBOX_FACTORY_BUILD_FAILED,
             backend=name,
             workspace=str(workspace),
             exc_info=True,
         )
         raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if name == "subprocess":
return SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)
return DockerSandbox(
config=config.docker,
workspace=workspace,
)
if name == "subprocess":
return SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)
if name == "docker":
return DockerSandbox(
config=config.docker,
workspace=workspace,
)
msg = f"Unsupported sandbox backend: {name!r}"
logger.error(SANDBOX_FACTORY_BUILD_FAILED, backend=name, error=msg)
raise ValueError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/sandbox/factory.py` around lines 48 - 56, The dispatch
currently treats any non-"subprocess" backend as Docker, which is fragile;
modify the branch that uses the name variable so it explicitly checks for both
"subprocess" and "docker" (e.g., if name == "subprocess": return
SubprocessSandbox(...); elif name == "docker": return DockerSandbox(...)) and
raise a clear exception (ValueError or custom) for any other name instead of
falling back to Docker; keep using config.subprocess, config.docker and
workspace when constructing the respective SubprocessSandbox and DockerSandbox
instances so callers get deterministic behavior and clear errors for unsupported
backends.

Add explicit check before try block so unknown names raise
ValueError instead of silently falling through to DockerSandbox.
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 18, 2026 07:29 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 311a1ab into main Mar 18, 2026
30 of 32 checks passed
@Aureliolo Aureliolo deleted the feat/sandbox-backend-wiring branch March 18, 2026 07:32
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 18, 2026 07:32 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 18, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.3](v0.3.2...v0.3.3)
(2026-03-18)


### Features

* **backup:** implement automated backup and restore system
([#541](#541))
([867b7c1](867b7c1))
* **providers:** runtime provider management with CRUD, presets, and
multi-auth ([#540](#540))
([936c345](936c345)),
closes [#451](#451)
* **tools:** wire per-category sandbox backend selection
([#534](#534))
([311a1ab](311a1ab))


### Bug Fixes

* **cli:** switch cosign verification from .sig tags to OCI referrers
([#533](#533))
([8ee5471](8ee5471)),
closes [#532](#532)


### CI/CD

* bump wrangler from 4.74.0 to 4.75.0 in /.github in the minor-and-patch
group ([#535](#535))
([de15867](de15867))


### Maintenance

* bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 in
/cli in the minor-and-patch group
([#536](#536))
([4a09aed](4a09aed))
* bump litellm from 1.82.3 to 1.82.4 in the minor-and-patch group
([#538](#538))
([9f7f83d](9f7f83d))
* bump vue-tsc from 3.2.5 to 3.2.6 in /web in the minor-and-patch group
across 1 directory
([#537](#537))
([eb3dc4e](eb3dc4e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Aureliolo added a commit that referenced this pull request Mar 18, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.3](v0.3.2...v0.3.3)
(2026-03-18)


### Features

* **backup:** implement automated backup and restore system
([#541](#541))
([867b7c1](867b7c1))
* **providers:** runtime provider management with CRUD, presets, and
multi-auth ([#540](#540))
([936c345](936c345)),
closes [#451](#451)
* **tools:** wire per-category sandbox backend selection
([#534](#534))
([311a1ab](311a1ab))


### Bug Fixes

* **ci:** add COSIGN_EXPERIMENTAL=1 for OCI referrer mode in cosign sign
([#543](#543))
([226ed2f](226ed2f))
* **cli:** switch cosign verification from .sig tags to OCI referrers
([#533](#533))
([8ee5471](8ee5471)),
closes [#532](#532)


### CI/CD

* bump wrangler from 4.74.0 to 4.75.0 in /.github in the minor-and-patch
group ([#535](#535))
([de15867](de15867))


### Maintenance

* bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 in
/cli in the minor-and-patch group
([#536](#536))
([4a09aed](4a09aed))
* bump litellm from 1.82.3 to 1.82.4 in the minor-and-patch group
([#538](#538))
([9f7f83d](9f7f83d))
* bump vue-tsc from 3.2.5 to 3.2.6 in /web in the minor-and-patch group
across 1 directory
([#537](#537))
([eb3dc4e](eb3dc4e))
* **main:** release 0.3.3
([#539](#539))
([c3de2a2](c3de2a2))
* revert v0.3.3 release artifacts (Docker signing failed)
([#544](#544))
([7f48f52](7f48f52))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: wire engine-level per-category sandbox backend selection from config

1 participant