feat(tools): wire per-category sandbox backend selection#534
Conversation
…ctory Bridge the config-to-runtime gap for SandboxingConfig by adding a sandbox backend factory that builds only needed backends, resolves the correct backend per tool category, and wires it into build_default_tools_from_config with backward-compatible fallback. Closes #265
Pre-reviewed by 11 agents, 12 findings addressed: - cleanup_sandbox_backends: error isolation + TaskGroup for parallel cleanup - resolve_sandbox_for_category: descriptive KeyError on missing backend - build_sandbox_backends: return MappingProxyType for immutability - DockerSandbox.get_backend_type: use NotBlankStr for consistency - CLAUDE.md: update Package Structure with sandbox factory - Tests: remove redundant @pytest.mark.asyncio, add guard assertions, fix test_tool_count_unchanged to test both code paths, add missing error-path tests
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a sandbox backend factory (build, resolve, cleanup), wires per-category sandbox resolution into the tools factory, exposes the sandbox API, adds sandbox observability event constants, and includes unit/integration tests plus documentation updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/Engine
participant Factory as Tools Factory
participant SandboxFactory as Sandbox Factory
participant Config as SandboxingConfig
participant Backends as Backend Instances
participant Tool as Tool Instance
Client->>Factory: build_default_tools_from_config(config, sandbox_backends=None)
Factory->>SandboxFactory: build_sandbox_backends(config, workspace)
SandboxFactory->>Config: read default_backend and overrides
SandboxFactory->>Backends: instantiate required backends
SandboxFactory-->>Factory: return Mapping[str, SandboxBackend]
loop For each tool needing sandbox
Factory->>SandboxFactory: resolve_sandbox_for_category(config, backends, ToolCategory)
SandboxFactory->>Config: get_backend_for_category(category.value)
SandboxFactory->>Backends: lookup backend by name
SandboxFactory-->>Factory: return SandboxBackend
Factory->>Tool: instantiate tool with resolved sandbox
end
Factory-->>Client: return tools with sandboxes wired
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the sandboxing capabilities by introducing a dedicated factory for managing sandbox backends. This allows for more granular control, enabling different tool categories to utilize specific sandbox implementations based on configuration. The change improves flexibility and resource management by only instantiating required backends and providing robust cleanup, laying the groundwork for more sophisticated tool isolation strategies. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a sandbox backend factory for per-category backend selection, a significant improvement for configurability. The implementation is high-quality, with robust logic, good use of modern asyncio features like TaskGroup, and exceptionally thorough unit and integration tests. I have one suggestion to improve the extensibility of the new factory.
| backends: dict[str, SandboxBackend] = {} | ||
|
|
||
| if "subprocess" in needed: | ||
| backends["subprocess"] = SubprocessSandbox( | ||
| config=config.subprocess, | ||
| workspace=workspace, | ||
| ) | ||
|
|
||
| if "docker" in needed: | ||
| backends["docker"] = DockerSandbox( | ||
| config=config.docker, | ||
| workspace=workspace, | ||
| ) |
There was a problem hiding this comment.
The current implementation uses a series of if statements to build the backend instances. This pattern can become cumbersome to maintain as more backend types are added. A more extensible approach would be to use a registry pattern (a dictionary mapping backend names to their constructor classes). This would centralize the backend registration and make the building logic more generic and concise. The suggested implementation uses a dictionary comprehension, assuming that the attribute name in SandboxingConfig for a backend's specific configuration matches the backend's name (e.g., config.docker for the 'docker' backend), which is the current convention.
| backends: dict[str, SandboxBackend] = {} | |
| if "subprocess" in needed: | |
| backends["subprocess"] = SubprocessSandbox( | |
| config=config.subprocess, | |
| workspace=workspace, | |
| ) | |
| if "docker" in needed: | |
| backends["docker"] = DockerSandbox( | |
| config=config.docker, | |
| workspace=workspace, | |
| ) | |
| backends: dict[str, SandboxBackend] = { | |
| name: builder(config=getattr(config, name), workspace=workspace) | |
| for name, builder in { | |
| "subprocess": SubprocessSandbox, | |
| "docker": DockerSandbox, | |
| }.items() | |
| if name in needed | |
| } |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #534 +/- ##
==========================================
- Coverage 93.19% 93.16% -0.03%
==========================================
Files 522 523 +1
Lines 25247 25314 +67
Branches 2396 2402 +6
==========================================
+ Hits 23530 23585 +55
- Misses 1360 1369 +9
- Partials 357 360 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/tools/sandbox/docker_sandbox.py`:
- Around line 645-647: The get_backend_type function currently constructs
NotBlankStr("docker") incorrectly (NotBlankStr is a type
alias/Annotated[str,...]); change it to return a plain string instead — replace
NotBlankStr("docker") with "docker" and annotate the return to satisfy typing
(e.g. add a type ignore comment like `# type: ignore[return-value]` or use
typing.cast(NotBlankStr, "docker")) so get_backend_type returns a plain str
while preserving the NotBlankStr return annotation.
In `@tests/unit/tools/sandbox/test_sandbox_factory.py`:
- Around line 22-23: Add a 30-second pytest timeout marker to the test classes
that currently only have `@pytest.mark.unit` so each test gets an explicit cap;
specifically, add `@pytest.mark.timeout`(30) stacked with the existing
`@pytest.mark.unit` above the TestBuildSandboxBackends class (and the other two
class declarations referenced at the same spots in the file) so the decorators
for those classes include both markers.
- Around line 224-239: Update the test_raises_key_error_when_backend_missing
assertion to verify the KeyError message contains descriptive diagnostics: when
calling resolve_sandbox_for_category with config (SandboxingConfig) and category
ToolCategory.VERSION_CONTROL and backends map, assert the raised KeyError
message includes both the missing backend name ("docker" or config.overrides
value), the category identifier (e.g., "version_control" or
ToolCategory.VERSION_CONTROL), and the list of available backends (from the
backends keys) so the match is more specific and will fail if the descriptive
message regresses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4c4eef4c-9d30-4c31-9aed-0ed66543f6e2
📒 Files selected for processing (9)
CLAUDE.mdsrc/synthorg/observability/events/sandbox.pysrc/synthorg/tools/factory.pysrc/synthorg/tools/sandbox/__init__.pysrc/synthorg/tools/sandbox/docker_sandbox.pysrc/synthorg/tools/sandbox/factory.pytests/integration/tools/test_sandbox_wiring_integration.pytests/unit/tools/sandbox/test_sandbox_factory.pytests/unit/tools/test_factory_sandbox_wiring.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Deploy Preview
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True,ConfigDict) for config and identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions:@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...
Files:
src/synthorg/observability/events/sandbox.pysrc/synthorg/tools/sandbox/factory.pysrc/synthorg/tools/sandbox/__init__.pysrc/synthorg/tools/sandbox/docker_sandbox.pysrc/synthorg/tools/factory.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use
cdin Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly
Files:
src/synthorg/observability/events/sandbox.pytests/integration/tools/test_sandbox_wiring_integration.pysrc/synthorg/tools/sandbox/factory.pysrc/synthorg/tools/sandbox/__init__.pysrc/synthorg/tools/sandbox/docker_sandbox.pytests/unit/tools/sandbox/test_sandbox_factory.pysrc/synthorg/tools/factory.pytests/unit/tools/test_factory_sandbox_wiring.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Maintain 80% minimum code coverage (enforced in CI)
Useasyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded
Set 30 second timeout per test
ALWAYS include-n autowhen running pytest with pytest-xdist, never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests. Use generic names:test-provider,test-small-001, etc. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2).claude/skill/agent files, (3) third-party import paths/module names
Use Hypothesis for property-based testing with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var:ci(200 examples, default) ordev(1000 examples). Run dev profile:HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic instead of widening timing margins
Files:
tests/integration/tools/test_sandbox_wiring_integration.pytests/unit/tools/sandbox/test_sandbox_factory.pytests/unit/tools/test_factory_sandbox_wiring.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
For Markdown documentation, always read the relevant
docs/design/page when implementing features to ensure alignment with design specifications
Files:
CLAUDE.md
🧠 Learnings (3)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/sandbox.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators
Applied to files:
src/synthorg/tools/sandbox/docker_sandbox.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement
Applied to files:
src/synthorg/tools/factory.py
🧬 Code graph analysis (5)
tests/integration/tools/test_sandbox_wiring_integration.py (2)
src/synthorg/config/loader.py (1)
load_config_from_string(495-524)src/synthorg/tools/factory.py (1)
build_default_tools_from_config(136-202)
src/synthorg/tools/sandbox/factory.py (2)
src/synthorg/tools/sandbox/docker_sandbox.py (4)
DockerSandbox(70-647)config(113-115)workspace(118-120)cleanup(601-620)src/synthorg/core/enums.py (1)
ToolCategory(294-308)
tests/unit/tools/sandbox/test_sandbox_factory.py (2)
src/synthorg/core/enums.py (1)
ToolCategory(294-308)src/synthorg/tools/sandbox/factory.py (3)
build_sandbox_backends(33-75)cleanup_sandbox_backends(126-153)resolve_sandbox_for_category(78-123)
src/synthorg/tools/factory.py (2)
src/synthorg/core/enums.py (1)
ToolCategory(294-308)src/synthorg/tools/sandbox/factory.py (2)
build_sandbox_backends(33-75)resolve_sandbox_for_category(78-123)
tests/unit/tools/test_factory_sandbox_wiring.py (3)
src/synthorg/tools/sandbox/docker_sandbox.py (2)
config(113-115)workspace(118-120)src/synthorg/config/schema.py (1)
RootConfig(387-651)src/synthorg/tools/factory.py (1)
build_default_tools_from_config(136-202)
🔇 Additional comments (7)
src/synthorg/observability/events/sandbox.py (1)
17-19: Looks good — new sandbox factory events are consistent.The added
SANDBOX_FACTORY_*constants follow the existing event naming scheme and fit this domain module cleanly.CLAUDE.md (1)
136-136: Documentation update is accurate and useful.The package structure entry clearly documents the new sandbox factory surface and responsibilities.
src/synthorg/tools/sandbox/__init__.py (1)
7-11: Public sandbox API export wiring is clean.Imports and
__all__were updated consistently, so the new factory helpers are exposed as intended.Also applies to: 28-30
tests/integration/tools/test_sandbox_wiring_integration.py (1)
14-135: Integration coverage for sandbox wiring is solid.The test set exercises default, override, and config-propagation paths through
load_config_from_string+build_default_tools_from_configeffectively.src/synthorg/tools/factory.py (1)
136-202: Per-category sandbox resolution wiring is correctly prioritized.The fallback order and
VERSION_CONTROLresolution path are clear and consistent with the stated behavior.src/synthorg/tools/sandbox/factory.py (1)
33-75: Factory implementation is well-structured and aligned with the objective.
build,resolve, andcleanupresponsibilities are clearly separated, with immutable backend mapping and structured parallel cleanup.Also applies to: 78-123, 126-153
tests/unit/tools/test_factory_sandbox_wiring.py (1)
49-228: Unit test coverage for sandbox wiring precedence is comprehensive.The cases validate explicit override behavior, provided backend-map behavior, auto-build behavior, and unaffected tool categories.
| def get_backend_type(self) -> NotBlankStr: | ||
| """Return ``'docker'``.""" | ||
| return "docker" | ||
| return NotBlankStr("docker") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== NotBlankStr definition =="
rg -n -C2 '^\s*NotBlankStr\s*=' src/synthorg/core/types.py
echo
echo "== get_backend_type implementations =="
rg -n -C2 '\bdef get_backend_type\s*\(' src/synthorg/tools/sandbox
echo
echo "== Constructor-style NotBlankStr usage =="
rg -n '\bNotBlankStr\s*\(' src/synthorgRepository: Aureliolo/synthorg
Length of output: 9129
Return a plain string instead of calling NotBlankStr as a constructor.
NotBlankStr is a type alias (Annotated[str, ...]), not a callable type. While NotBlankStr("docker") won't fail at runtime, it's incorrect usage. Instead, return the plain string:
def get_backend_type(self) -> NotBlankStr:
"""Return ``'docker'``."""
return "docker" # type: ignore[return-value]Or use cast for clarity:
from typing import cast
return cast(NotBlankStr, "docker")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/tools/sandbox/docker_sandbox.py` around lines 645 - 647, The
get_backend_type function currently constructs NotBlankStr("docker") incorrectly
(NotBlankStr is a type alias/Annotated[str,...]); change it to return a plain
string instead — replace NotBlankStr("docker") with "docker" and annotate the
return to satisfy typing (e.g. add a type ignore comment like `# type:
ignore[return-value]` or use typing.cast(NotBlankStr, "docker")) so
get_backend_type returns a plain str while preserving the NotBlankStr return
annotation.
| @pytest.mark.unit | ||
| class TestBuildSandboxBackends: |
There was a problem hiding this comment.
Add a 30s timeout marker for these test classes
These classes currently only carry @pytest.mark.unit. Add a timeout marker so each contained test has an explicit 30-second cap.
Suggested class-level timeout markers
`@pytest.mark.unit`
+@pytest.mark.timeout(30)
class TestBuildSandboxBackends:
@@
`@pytest.mark.unit`
+@pytest.mark.timeout(30)
class TestResolveSandboxForCategory:
@@
`@pytest.mark.unit`
+@pytest.mark.timeout(30)
class TestCleanupSandboxBackends:As per coding guidelines, "Set 30 second timeout per test".
Also applies to: 158-159, 242-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/tools/sandbox/test_sandbox_factory.py` around lines 22 - 23, Add a
30-second pytest timeout marker to the test classes that currently only have
`@pytest.mark.unit` so each test gets an explicit cap; specifically, add
`@pytest.mark.timeout`(30) stacked with the existing `@pytest.mark.unit` above the
TestBuildSandboxBackends class (and the other two class declarations referenced
at the same spots in the file) so the decorators for those classes include both
markers.
| def test_raises_key_error_when_backend_missing(self) -> None: | ||
| """KeyError with descriptive message when backend not in map.""" | ||
| config = SandboxingConfig( | ||
| default_backend="subprocess", | ||
| overrides={"version_control": "docker"}, | ||
| ) | ||
| backends: Mapping[str, SandboxBackend] = { | ||
| "subprocess": MagicMock(spec=SandboxBackend), | ||
| } | ||
|
|
||
| with pytest.raises(KeyError, match="docker"): | ||
| resolve_sandbox_for_category( | ||
| config=config, | ||
| backends=backends, | ||
| category=ToolCategory.VERSION_CONTROL, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Strengthen the KeyError assertion to lock in descriptive diagnostics
match="docker" is too broad and would still pass if the message regresses to a bare backend name. Assert key message fragments (category + available backends) to protect the intended behavior.
Suggested test assertion hardening
- with pytest.raises(KeyError, match="docker"):
+ with pytest.raises(
+ KeyError,
+ match=r"resolved for category 'version_control'.*available: \['subprocess'\]",
+ ):
resolve_sandbox_for_category(
config=config,
backends=backends,
category=ToolCategory.VERSION_CONTROL,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_raises_key_error_when_backend_missing(self) -> None: | |
| """KeyError with descriptive message when backend not in map.""" | |
| config = SandboxingConfig( | |
| default_backend="subprocess", | |
| overrides={"version_control": "docker"}, | |
| ) | |
| backends: Mapping[str, SandboxBackend] = { | |
| "subprocess": MagicMock(spec=SandboxBackend), | |
| } | |
| with pytest.raises(KeyError, match="docker"): | |
| resolve_sandbox_for_category( | |
| config=config, | |
| backends=backends, | |
| category=ToolCategory.VERSION_CONTROL, | |
| ) | |
| def test_raises_key_error_when_backend_missing(self) -> None: | |
| """KeyError with descriptive message when backend not in map.""" | |
| config = SandboxingConfig( | |
| default_backend="subprocess", | |
| overrides={"version_control": "docker"}, | |
| ) | |
| backends: Mapping[str, SandboxBackend] = { | |
| "subprocess": MagicMock(spec=SandboxBackend), | |
| } | |
| with pytest.raises( | |
| KeyError, | |
| match=r"resolved for category 'version_control'.*available: \['subprocess'\]", | |
| ): | |
| resolve_sandbox_for_category( | |
| config=config, | |
| backends=backends, | |
| category=ToolCategory.VERSION_CONTROL, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/tools/sandbox/test_sandbox_factory.py` around lines 224 - 239,
Update the test_raises_key_error_when_backend_missing assertion to verify the
KeyError message contains descriptive diagnostics: when calling
resolve_sandbox_for_category with config (SandboxingConfig) and category
ToolCategory.VERSION_CONTROL and backends map, assert the raised KeyError
message includes both the missing backend name ("docker" or config.overrides
value), the category identifier (e.g., "version_control" or
ToolCategory.VERSION_CONTROL), and the list of available backends (from the
backends keys) so the match is more specific and will fail if the descriptive
message regresses.
…abbit - Replace TaskGroup with gather(return_exceptions=True) in cleanup_sandbox_backends for shutdown-resilient error isolation - Make cleanup_sandbox_backends keyword-only for API consistency - Add SANDBOX_FACTORY_RESOLVE_FAILED and SANDBOX_FACTORY_CLEANUP_FAILED event constants for distinct success/error log filtering - Extract _resolve_vc_sandbox helper to keep build_default_tools_from_config under 50-line limit - Add KeyError to Raises section of build_default_tools_from_config docstring - Change `from None` to `from exc` in resolve_sandbox_for_category to preserve exception chain - Add unknown backend name validation in build_sandbox_backends - Add cleanup logging assertion in test_cleanup_continues_on_error - Strengthen KeyError assertion to lock in diagnostic message content - Update docs/architecture/tech-stack.md: DockerSandbox no longer "planned" - Update docs/design/operations.md: document sandbox factory implementation - Add explanatory comment to type: ignore in test helper
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
tests/unit/tools/sandbox/test_sandbox_factory.py (1)
23-24:⚠️ Potential issue | 🟡 MinorTimeout markers are still missing on the unit test classes.
Please add
@pytest.mark.timeout(30)to each test class.✅ Suggested change
`@pytest.mark.unit` +@pytest.mark.timeout(30) class TestBuildSandboxBackends: @@ `@pytest.mark.unit` +@pytest.mark.timeout(30) class TestResolveSandboxForCategory: @@ `@pytest.mark.unit` +@pytest.mark.timeout(30) class TestCleanupSandboxBackends:As per coding guidelines, "Set 30 second timeout per test".
Also applies to: 159-160, 249-250
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/tools/sandbox/test_sandbox_factory.py` around lines 23 - 24, The test classes lack the required timeout decorator; add `@pytest.mark.timeout`(30) above each unit test class definition (e.g., add it above class TestBuildSandboxBackends and the other test classes referenced) so every test class has a 30-second timeout; ensure you import pytest if not already present and place the decorator directly above the class declaration to apply the timeout to all contained tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/tools/factory.py`:
- Around line 146-155: The current code eagerly calls build_sandbox_backends
when sandbox_backends is None, which constructs all configured backends
(including Docker) even though only ToolCategory.VERSION_CONTROL is needed;
change the flow to only build or resolve the backends required for
VERSION_CONTROL: either pass sandbox_backends=None into
resolve_sandbox_for_category so it lazily constructs only the requested backend,
or update/build a new helper (e.g., build_sandbox_backends_for_category or
extend build_sandbox_backends with a categories/categories_allowed parameter)
and call that with category=ToolCategory.VERSION_CONTROL (keeping
config=config.sandboxing and workspace=workspace) before calling
resolve_sandbox_for_category; update references to sandbox_backends,
build_sandbox_backends, resolve_sandbox_for_category, and
ToolCategory.VERSION_CONTROL accordingly.
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 171-174: Replace the use of asyncio.gather for the fan-out/fan-in
cleanup with structured concurrency using asyncio.TaskGroup: inside the async
function where backend_items and _cleanup_one are used, create an
asyncio.TaskGroup, start tasks for each (_cleanup_one(n, b)) via
tg.create_task(...) (or assign to a locals list if you need results), await the
TaskGroup context so exceptions propagate in a structured way, and collect
results or handle exceptions after the with asyncio.TaskGroup() as tg block;
update any code that expects the previous results variable accordingly and
ensure return_exceptions semantics are handled explicitly (e.g., try/except per
task) if you need to preserve them.
In `@tests/unit/tools/test_factory_sandbox_wiring.py`:
- Around line 49-50: Add an explicit 30s timeout marker to the test class by
stacking `@pytest.mark.timeout`(30) with the existing `@pytest.mark.unit` on the
TestFactorySandboxWiring class; update the decorator list above class
TestFactorySandboxWiring so both `@pytest.mark.unit` and `@pytest.mark.timeout`(30)
appear (order doesn't matter) to enforce the 30-second timeout per test.
---
Duplicate comments:
In `@tests/unit/tools/sandbox/test_sandbox_factory.py`:
- Around line 23-24: The test classes lack the required timeout decorator; add
`@pytest.mark.timeout`(30) above each unit test class definition (e.g., add it
above class TestBuildSandboxBackends and the other test classes referenced) so
every test class has a 30-second timeout; ensure you import pytest if not
already present and place the decorator directly above the class declaration to
apply the timeout to all contained tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 685192b9-459e-4520-8fdc-7df7305c112a
📒 Files selected for processing (7)
docs/architecture/tech-stack.mddocs/design/operations.mdsrc/synthorg/observability/events/sandbox.pysrc/synthorg/tools/factory.pysrc/synthorg/tools/sandbox/factory.pytests/unit/tools/sandbox/test_sandbox_factory.pytests/unit/tools/test_factory_sandbox_wiring.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
For Markdown documentation, always read the relevant
docs/design/page when implementing features to ensure alignment with design specifications
Files:
docs/design/operations.mddocs/architecture/tech-stack.md
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True,ConfigDict) for config and identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions:@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...
Files:
src/synthorg/observability/events/sandbox.pysrc/synthorg/tools/factory.pysrc/synthorg/tools/sandbox/factory.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use
cdin Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly
Files:
src/synthorg/observability/events/sandbox.pytests/unit/tools/test_factory_sandbox_wiring.pytests/unit/tools/sandbox/test_sandbox_factory.pysrc/synthorg/tools/factory.pysrc/synthorg/tools/sandbox/factory.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Maintain 80% minimum code coverage (enforced in CI)
Useasyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded
Set 30 second timeout per test
ALWAYS include-n autowhen running pytest with pytest-xdist, never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests. Use generic names:test-provider,test-small-001, etc. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2).claude/skill/agent files, (3) third-party import paths/module names
Use Hypothesis for property-based testing with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var:ci(200 examples, default) ordev(1000 examples). Run dev profile:HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic instead of widening timing margins
Files:
tests/unit/tools/test_factory_sandbox_wiring.pytests/unit/tools/sandbox/test_sandbox_factory.py
🧠 Learnings (9)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/sandbox.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.
Applied to files:
tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to tests/**/*.py : Set 30 second timeout per test
Applied to files:
tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to tests/**/*.py : Use pytest markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`
Applied to files:
tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/tools/sandbox/test_sandbox_factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement
Applied to files:
src/synthorg/tools/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
docs/architecture/tech-stack.md
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`
Applied to files:
docs/architecture/tech-stack.md
🧬 Code graph analysis (2)
tests/unit/tools/test_factory_sandbox_wiring.py (2)
src/synthorg/config/schema.py (1)
RootConfig(387-651)src/synthorg/tools/factory.py (1)
build_default_tools_from_config(158-219)
src/synthorg/tools/factory.py (2)
src/synthorg/core/enums.py (1)
ToolCategory(294-308)src/synthorg/tools/sandbox/factory.py (2)
build_sandbox_backends(37-92)resolve_sandbox_for_category(95-140)
🔇 Additional comments (3)
src/synthorg/observability/events/sandbox.py (1)
17-21: Event constants are well-scoped and consistent.These additions follow the existing sandbox event naming scheme and are ready to use in structured logging paths.
docs/design/operations.md (1)
466-471: Documentation update is clear and accurately scoped.This captures the factory responsibilities and current category wiring status without over-claiming.
docs/architecture/tech-stack.md (1)
114-114: Tech-stack convention row is aligned with the new sandbox wiring model.
| if sandbox_backends is None: | ||
| sandbox_backends = build_sandbox_backends( | ||
| config=config.sandboxing, | ||
| workspace=workspace, | ||
| ) | ||
| return resolve_sandbox_for_category( | ||
| config=config.sandboxing, | ||
| backends=sandbox_backends, | ||
| category=ToolCategory.VERSION_CONTROL, | ||
| ) |
There was a problem hiding this comment.
Avoid instantiating unrelated backends in the VERSION_CONTROL-only path.
When sandbox_backends is None, Line 147 builds all configured backends (default + every override), but this code path only consumes ToolCategory.VERSION_CONTROL. That can eagerly construct unused backends (notably Docker), adding avoidable startup failures/dependencies for git-only tool builds.
♻️ Proposed fix
def _resolve_vc_sandbox(
*,
config: RootConfig,
sandbox_backends: Mapping[str, SandboxBackend] | None,
workspace: Path,
) -> SandboxBackend:
@@
- if sandbox_backends is None:
- sandbox_backends = build_sandbox_backends(
- config=config.sandboxing,
- workspace=workspace,
- )
+ if sandbox_backends is None:
+ vc_backend_name = config.sandboxing.backend_for_category(
+ ToolCategory.VERSION_CONTROL.value,
+ )
+ vc_only_config = config.sandboxing.model_copy(
+ update={
+ "default_backend": vc_backend_name,
+ "overrides": {},
+ },
+ )
+ sandbox_backends = build_sandbox_backends(
+ config=vc_only_config,
+ workspace=workspace,
+ )As per coding guidelines, "If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/tools/factory.py` around lines 146 - 155, The current code
eagerly calls build_sandbox_backends when sandbox_backends is None, which
constructs all configured backends (including Docker) even though only
ToolCategory.VERSION_CONTROL is needed; change the flow to only build or resolve
the backends required for VERSION_CONTROL: either pass sandbox_backends=None
into resolve_sandbox_for_category so it lazily constructs only the requested
backend, or update/build a new helper (e.g., build_sandbox_backends_for_category
or extend build_sandbox_backends with a categories/categories_allowed parameter)
and call that with category=ToolCategory.VERSION_CONTROL (keeping
config=config.sandboxing and workspace=workspace) before calling
resolve_sandbox_for_category; update references to sandbox_backends,
build_sandbox_backends, resolve_sandbox_for_category, and
ToolCategory.VERSION_CONTROL accordingly.
| results = await asyncio.gather( | ||
| *(_cleanup_one(n, b) for n, b in backend_items), | ||
| return_exceptions=True, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use TaskGroup here (or explicitly document the exception).
Line 171 currently uses asyncio.gather(...) for a new fan-out/fan-in path. Project convention prefers structured concurrency with asyncio.TaskGroup for this pattern.
As per coding guidelines, "Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code... Prefer structured concurrency over bare create_task".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/tools/sandbox/factory.py` around lines 171 - 174, Replace the
use of asyncio.gather for the fan-out/fan-in cleanup with structured concurrency
using asyncio.TaskGroup: inside the async function where backend_items and
_cleanup_one are used, create an asyncio.TaskGroup, start tasks for each
(_cleanup_one(n, b)) via tg.create_task(...) (or assign to a locals list if you
need results), await the TaskGroup context so exceptions propagate in a
structured way, and collect results or handle exceptions after the with
asyncio.TaskGroup() as tg block; update any code that expects the previous
results variable accordingly and ensure return_exceptions semantics are handled
explicitly (e.g., try/except per task) if you need to preserve them.
| @pytest.mark.unit | ||
| class TestFactorySandboxWiring: |
There was a problem hiding this comment.
Add an explicit 30s timeout marker for this test class.
Please stack @pytest.mark.timeout(30) with the existing unit marker.
✅ Suggested change
`@pytest.mark.unit`
+@pytest.mark.timeout(30)
class TestFactorySandboxWiring:As per coding guidelines, "Set 30 second timeout per test".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/tools/test_factory_sandbox_wiring.py` around lines 49 - 50, Add an
explicit 30s timeout marker to the test class by stacking
`@pytest.mark.timeout`(30) with the existing `@pytest.mark.unit` on the
TestFactorySandboxWiring class; update the decorator list above class
TestFactorySandboxWiring so both `@pytest.mark.unit` and `@pytest.mark.timeout`(30)
appear (order doesn't matter) to enforce the 30-second timeout per test.
Add comment explaining why cleanup_sandbox_backends uses asyncio.gather(return_exceptions=True) instead of TaskGroup: TaskGroup cancels sibling tasks on BaseException, defeating the error-isolation guarantee during shutdown.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 69-70: The current error path logs the success event
SANDBOX_FACTORY_BUILT which skews telemetry; change the logger.error call in the
error branch that now uses SANDBOX_FACTORY_BUILT to emit a failure-specific
event (e.g., SANDBOX_FACTORY_BUILD_FAILED) or include an explicit failure status
field on the emitted event so telemetry can filter it; update the event constant
declaration (and any exports) and replace the reference in the failing branch
that calls logger.error(SANDBOX_FACTORY_BUILT, error=msg) so it uses the new
failure constant (or adds status="failed") before raising ValueError(msg).
- Around line 74-84: The factory currently instantiates SubprocessSandbox and
DockerSandbox directly so if their constructors raise the exception is
propagated without backend-specific context; wrap each constructor call (the
SubprocessSandbox call that uses config.subprocess and workspace, and the
DockerSandbox call that uses config.docker and workspace) in a try/except that
logs a structured ERROR (include backend name and relevant config reference)
before re-raising the exception, then assign to backends["subprocess"] /
backends["docker"] as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b14f7f0b-8711-4da4-bfd0-0b6c6bcf13a2
📒 Files selected for processing (1)
src/synthorg/tools/sandbox/factory.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True,ConfigDict) for config and identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions:@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...
Files:
src/synthorg/tools/sandbox/factory.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use
cdin Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly
Files:
src/synthorg/tools/sandbox/factory.py
🧠 Learnings (2)
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
src/synthorg/tools/sandbox/factory.py
🧬 Code graph analysis (1)
src/synthorg/tools/sandbox/factory.py (2)
tests/unit/tools/sandbox/conftest.py (1)
subprocess_sandbox(24-32)src/synthorg/core/enums.py (1)
ToolCategory(294-308)
🔇 Additional comments (2)
src/synthorg/tools/sandbox/factory.py (2)
95-140: Category resolution path looks solid.Good contextual warning on missing backend and clear debug logging on successful resolution.
170-188: Cleanup error-isolation behavior is well implemented.The documented rationale plus per-backend isolation/logging aligns with the best-effort shutdown contract.
- Add SANDBOX_FACTORY_BUILD_FAILED event constant so the error path in build_sandbox_backends does not reuse the success event - Wrap SubprocessSandbox/DockerSandbox constructors in try/except to log backend-specific context before re-raising
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/synthorg/tools/sandbox/factory.py (1)
189-198:⚠️ Potential issue | 🟠 Major
cleanup_sandbox_backendsstill diverges from the TaskGroup requirement.Line 195 keeps
asyncio.gather(...)in a new fan-out cleanup path. This was previously flagged and is still a divergence from the stated TaskGroup direction for this feature; if intentional, it needs explicit spec approval/update.🔧 TaskGroup-compatible pattern (while preserving per-backend isolation)
- backend_items = list(backends.items()) - results = await asyncio.gather( - *(_cleanup_one(n, b) for n, b in backend_items), - return_exceptions=True, - ) - # Log BaseException subclasses (CancelledError, KeyboardInterrupt) - # that escaped _cleanup_one's except Exception block - for (name, _), result in zip(backend_items, results, strict=True): - if isinstance(result, BaseException): - logger.error( - SANDBOX_FACTORY_CLEANUP_FAILED, - backend=name, - error=f"unhandled exception during cleanup: {result!r}", - ) + async def _cleanup_one_isolated(name: str, backend: SandboxBackend) -> None: + try: + await _cleanup_one(name, backend) + except asyncio.CancelledError: + logger.warning( + SANDBOX_FACTORY_CLEANUP_FAILED, + backend=name, + error="cleanup task cancelled", + exc_info=True, + ) + + async with asyncio.TaskGroup() as tg: + for name, backend in backends.items(): + tg.create_task(_cleanup_one_isolated(name, backend))Use this script to confirm implementation/spec alignment:
#!/bin/bash rg -n "cleanup_sandbox_backends|asyncio\\.gather|TaskGroup" src/synthorg/tools/sandbox/factory.py rg -n "TaskGroup|sandboxing|11\\.1\\.2|15\\.5" docs CLAUDE.md || trueAs per coding guidelines, "Prefer
asyncio.TaskGroupfor fan-out/fan-in parallel operations in new code ... Prefer structured concurrency over barecreate_task" and "If implementation deviates from the design spec, alert the user and explain why".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/tools/sandbox/factory.py` around lines 189 - 198, The current cleanup uses asyncio.gather inside cleanup_sandbox_backends which deviates from the TaskGroup requirement; replace the gather call with an asyncio.TaskGroup and create a separate task for each backend using tg.create_task, but wrap each call to _cleanup_one(n, b) in a small try/except wrapper coroutine that catches and records exceptions (so an exception in one backend does not cancel sibling tasks); collect results/exceptions into the same results list or a dict keyed by backend name (use backend_items, _cleanup_one, and cleanup_sandbox_backends to locate code) and ensure return_exceptions-like behavior by storing exceptions rather than letting them propagate out of the TaskGroup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 38-111: The build_sandbox_backends function exceeds the 50-line
limit; refactor by extracting validation and instantiation into small helpers:
create a validate_backends(needed: set[str]) function that checks needed against
_KNOWN_BACKENDS and logs/raises using SANDBOX_FACTORY_BUILD_FAILED, and an
instantiate_backend(name: str, config, workspace: Path) -> SandboxBackend helper
that constructs SubprocessSandbox or DockerSandbox, logs errors with
SANDBOX_FACTORY_BUILD_FAILED and re-raises; then have build_sandbox_backends
compute needed, call validate_backends(needed), loop needed to call
instantiate_backend for each required backend, collect into backends, and
finally log SANDBOX_FACTORY_BUILT and return MappingProxyType(backends).
---
Duplicate comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 189-198: The current cleanup uses asyncio.gather inside
cleanup_sandbox_backends which deviates from the TaskGroup requirement; replace
the gather call with an asyncio.TaskGroup and create a separate task for each
backend using tg.create_task, but wrap each call to _cleanup_one(n, b) in a
small try/except wrapper coroutine that catches and records exceptions (so an
exception in one backend does not cancel sibling tasks); collect
results/exceptions into the same results list or a dict keyed by backend name
(use backend_items, _cleanup_one, and cleanup_sandbox_backends to locate code)
and ensure return_exceptions-like behavior by storing exceptions rather than
letting them propagate out of the TaskGroup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6816dbe0-d6db-400d-bcad-d643370c8b19
📒 Files selected for processing (2)
src/synthorg/observability/events/sandbox.pysrc/synthorg/tools/sandbox/factory.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True,ConfigDict) for config and identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions:@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...
Files:
src/synthorg/tools/sandbox/factory.pysrc/synthorg/observability/events/sandbox.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use
cdin Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly
Files:
src/synthorg/tools/sandbox/factory.pysrc/synthorg/observability/events/sandbox.py
🧠 Learnings (5)
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/sandbox.py
🧬 Code graph analysis (1)
src/synthorg/tools/sandbox/factory.py (2)
src/synthorg/tools/sandbox/docker_sandbox.py (1)
cleanup(601-620)src/synthorg/core/enums.py (1)
ToolCategory(294-308)
🔇 Additional comments (2)
src/synthorg/observability/events/sandbox.py (1)
17-22: Good additive sandbox factory event coverage.These constants cleanly cover build/resolve/cleanup success and failure paths and fit the existing event taxonomy.
src/synthorg/tools/sandbox/factory.py (1)
64-103: Strong error-path observability in backend construction.Validation and constructor failure branches log structured context before re-raising, which improves startup diagnostics.
| def build_sandbox_backends( | ||
| *, | ||
| config: SandboxingConfig, | ||
| workspace: Path, | ||
| ) -> MappingProxyType[str, SandboxBackend]: | ||
| """Build only the backend instances actually referenced by *config*. | ||
|
|
||
| Collects which backend names are needed (the default plus all | ||
| override values), then instantiates ``SubprocessSandbox`` and/or | ||
| ``DockerSandbox`` with their respective sub-configs. | ||
|
|
||
| Args: | ||
| config: Top-level sandboxing configuration. | ||
| workspace: Absolute path to the agent workspace root. | ||
|
|
||
| Returns: | ||
| A read-only mapping of backend name to backend instance. | ||
| Only contains keys for backends that are actually referenced. | ||
|
|
||
| Raises: | ||
| ValueError: If *config* references an unrecognized backend | ||
| name not in the known set (``subprocess``, ``docker``). | ||
| """ | ||
| needed: set[str] = {config.default_backend} | ||
| needed.update(config.overrides.values()) | ||
|
|
||
| unknown = needed - _KNOWN_BACKENDS | ||
| if unknown: | ||
| msg = ( | ||
| f"Unrecognized sandbox backend name(s): {sorted(unknown)}; " | ||
| f"known backends: {sorted(_KNOWN_BACKENDS)}" | ||
| ) | ||
| logger.error(SANDBOX_FACTORY_BUILD_FAILED, error=msg) | ||
| raise ValueError(msg) | ||
|
|
||
| backends: dict[str, SandboxBackend] = {} | ||
|
|
||
| if "subprocess" in needed: | ||
| try: | ||
| backends["subprocess"] = SubprocessSandbox( | ||
| config=config.subprocess, | ||
| workspace=workspace, | ||
| ) | ||
| except Exception: | ||
| logger.error( | ||
| SANDBOX_FACTORY_BUILD_FAILED, | ||
| backend="subprocess", | ||
| workspace=str(workspace), | ||
| exc_info=True, | ||
| ) | ||
| raise | ||
|
|
||
| if "docker" in needed: | ||
| try: | ||
| backends["docker"] = DockerSandbox( | ||
| config=config.docker, | ||
| workspace=workspace, | ||
| ) | ||
| except Exception: | ||
| logger.error( | ||
| SANDBOX_FACTORY_BUILD_FAILED, | ||
| backend="docker", | ||
| workspace=str(workspace), | ||
| exc_info=True, | ||
| ) | ||
| raise | ||
|
|
||
| logger.info( | ||
| SANDBOX_FACTORY_BUILT, | ||
| backends=sorted(backends.keys()), | ||
| default=config.default_backend, | ||
| override_count=len(config.overrides), | ||
| ) | ||
| return MappingProxyType(backends) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Refactor build_sandbox_backends to stay within function-size limits.
This function is currently over the project’s 50-line limit; split validation/instantiation into helpers to keep the public function focused.
♻️ Proposed refactor split
+def _collect_needed_backends(config: SandboxingConfig) -> set[str]:
+ needed: set[str] = {config.default_backend}
+ needed.update(config.overrides.values())
+ return needed
+
+def _validate_needed_backends(needed: set[str]) -> None:
+ unknown = needed - _KNOWN_BACKENDS
+ if unknown:
+ msg = (
+ f"Unrecognized sandbox backend name(s): {sorted(unknown)}; "
+ f"known backends: {sorted(_KNOWN_BACKENDS)}"
+ )
+ logger.error(SANDBOX_FACTORY_BUILD_FAILED, error=msg)
+ raise ValueError(msg)
+
def build_sandbox_backends(
@@
- needed: set[str] = {config.default_backend}
- needed.update(config.overrides.values())
-
- unknown = needed - _KNOWN_BACKENDS
- if unknown:
- msg = (
- f"Unrecognized sandbox backend name(s): {sorted(unknown)}; "
- f"known backends: {sorted(_KNOWN_BACKENDS)}"
- )
- logger.error(SANDBOX_FACTORY_BUILD_FAILED, error=msg)
- raise ValueError(msg)
+ needed = _collect_needed_backends(config)
+ _validate_needed_backends(needed)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/tools/sandbox/factory.py` around lines 38 - 111, The
build_sandbox_backends function exceeds the 50-line limit; refactor by
extracting validation and instantiation into small helpers: create a
validate_backends(needed: set[str]) function that checks needed against
_KNOWN_BACKENDS and logs/raises using SANDBOX_FACTORY_BUILD_FAILED, and an
instantiate_backend(name: str, config, workspace: Path) -> SandboxBackend helper
that constructs SubprocessSandbox or DockerSandbox, logs errors with
SANDBOX_FACTORY_BUILD_FAILED and re-raises; then have build_sandbox_backends
compute needed, call validate_backends(needed), loop needed to call
instantiate_backend for each required backend, collect into backends, and
finally log SANDBOX_FACTORY_BUILT and return MappingProxyType(backends).
…ackends under 50 lines
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/tools/sandbox/factory.py`:
- Around line 48-56: The dispatch currently treats any non-"subprocess" backend
as Docker, which is fragile; modify the branch that uses the name variable so it
explicitly checks for both "subprocess" and "docker" (e.g., if name ==
"subprocess": return SubprocessSandbox(...); elif name == "docker": return
DockerSandbox(...)) and raise a clear exception (ValueError or custom) for any
other name instead of falling back to Docker; keep using config.subprocess,
config.docker and workspace when constructing the respective SubprocessSandbox
and DockerSandbox instances so callers get deterministic behavior and clear
errors for unsupported backends.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e9535369-8af8-4cbe-8a21-2edfe797046c
📒 Files selected for processing (1)
src/synthorg/tools/sandbox/factory.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Deploy Preview
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
All public functions must have type hints; use mypy strict mode for type checking
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Create new objects for immutability, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models (frozen=True,ConfigDict) for config and identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Use Pydantic v2 conventions:@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.,TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow excepti...
Files:
src/synthorg/tools/sandbox/factory.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use
cdin Bash commands within code or documentation — the working directory is already set to the project root. Use absolute paths or run commands directly
Files:
src/synthorg/tools/sandbox/factory.py
🧠 Learnings (5)
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/tools/sandbox/factory.py
📚 Learning: 2026-03-17T22:48:14.482Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:48:14.482Z
Learning: Applies to src/synthorg/**/*.py : Keep functions under 50 lines and files under 800 lines
Applied to files:
src/synthorg/tools/sandbox/factory.py
🧬 Code graph analysis (1)
src/synthorg/tools/sandbox/factory.py (3)
src/synthorg/tools/sandbox/docker_sandbox.py (4)
DockerSandbox(70-647)config(113-115)workspace(118-120)cleanup(601-620)tests/unit/tools/sandbox/conftest.py (1)
subprocess_sandbox(24-32)src/synthorg/core/enums.py (1)
ToolCategory(294-308)
🔇 Additional comments (4)
src/synthorg/tools/sandbox/factory.py (4)
13-35: Observability wiring looks solid.Using
get_logger(__name__)with domain event constants here is consistent and clean.
67-113:build_sandbox_backendsimplementation matches the objective well.Validation, selective instantiation, and read-only return via
MappingProxyTypeare all correctly handled.
115-160: Category resolution path is robust and well-instrumented.The missing-backend error path includes useful context and preserves failure semantics via
KeyError.
163-208: Cleanup strategy is appropriately resilient for best-effort shutdown.The explicit
gather(return_exceptions=True)rationale here is clear and preserves per-backend isolation during cleanup.
| if name == "subprocess": | ||
| return SubprocessSandbox( | ||
| config=config.subprocess, | ||
| workspace=workspace, | ||
| ) | ||
| return DockerSandbox( | ||
| config=config.docker, | ||
| workspace=workspace, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Make backend dispatch explicit instead of implicit docker fallback.
Any non-"subprocess" value currently routes to DockerSandbox. That’s fragile if this helper is reused without prior validation.
♻️ Proposed defensive dispatch
try:
if name == "subprocess":
return SubprocessSandbox(
config=config.subprocess,
workspace=workspace,
)
- return DockerSandbox(
- config=config.docker,
- workspace=workspace,
- )
+ if name == "docker":
+ return DockerSandbox(
+ config=config.docker,
+ workspace=workspace,
+ )
+ msg = f"Unsupported sandbox backend: {name!r}"
+ logger.error(SANDBOX_FACTORY_BUILD_FAILED, backend=name, error=msg)
+ raise ValueError(msg)
except Exception:
logger.error(
SANDBOX_FACTORY_BUILD_FAILED,
backend=name,
workspace=str(workspace),
exc_info=True,
)
raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if name == "subprocess": | |
| return SubprocessSandbox( | |
| config=config.subprocess, | |
| workspace=workspace, | |
| ) | |
| return DockerSandbox( | |
| config=config.docker, | |
| workspace=workspace, | |
| ) | |
| if name == "subprocess": | |
| return SubprocessSandbox( | |
| config=config.subprocess, | |
| workspace=workspace, | |
| ) | |
| if name == "docker": | |
| return DockerSandbox( | |
| config=config.docker, | |
| workspace=workspace, | |
| ) | |
| msg = f"Unsupported sandbox backend: {name!r}" | |
| logger.error(SANDBOX_FACTORY_BUILD_FAILED, backend=name, error=msg) | |
| raise ValueError(msg) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/tools/sandbox/factory.py` around lines 48 - 56, The dispatch
currently treats any non-"subprocess" backend as Docker, which is fragile;
modify the branch that uses the name variable so it explicitly checks for both
"subprocess" and "docker" (e.g., if name == "subprocess": return
SubprocessSandbox(...); elif name == "docker": return DockerSandbox(...)) and
raise a clear exception (ValueError or custom) for any other name instead of
falling back to Docker; keep using config.subprocess, config.docker and
workspace when constructing the respective SubprocessSandbox and DockerSandbox
instances so callers get deterministic behavior and clear errors for unsupported
backends.
Add explicit check before try block so unknown names raise ValueError instead of silently falling through to DockerSandbox.
🤖 I have created a release *beep* *boop* --- ## [0.3.3](v0.3.2...v0.3.3) (2026-03-18) ### Features * **backup:** implement automated backup and restore system ([#541](#541)) ([867b7c1](867b7c1)) * **providers:** runtime provider management with CRUD, presets, and multi-auth ([#540](#540)) ([936c345](936c345)), closes [#451](#451) * **tools:** wire per-category sandbox backend selection ([#534](#534)) ([311a1ab](311a1ab)) ### Bug Fixes * **cli:** switch cosign verification from .sig tags to OCI referrers ([#533](#533)) ([8ee5471](8ee5471)), closes [#532](#532) ### CI/CD * bump wrangler from 4.74.0 to 4.75.0 in /.github in the minor-and-patch group ([#535](#535)) ([de15867](de15867)) ### Maintenance * bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 in /cli in the minor-and-patch group ([#536](#536)) ([4a09aed](4a09aed)) * bump litellm from 1.82.3 to 1.82.4 in the minor-and-patch group ([#538](#538)) ([9f7f83d](9f7f83d)) * bump vue-tsc from 3.2.5 to 3.2.6 in /web in the minor-and-patch group across 1 directory ([#537](#537)) ([eb3dc4e](eb3dc4e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [0.3.3](v0.3.2...v0.3.3) (2026-03-18) ### Features * **backup:** implement automated backup and restore system ([#541](#541)) ([867b7c1](867b7c1)) * **providers:** runtime provider management with CRUD, presets, and multi-auth ([#540](#540)) ([936c345](936c345)), closes [#451](#451) * **tools:** wire per-category sandbox backend selection ([#534](#534)) ([311a1ab](311a1ab)) ### Bug Fixes * **ci:** add COSIGN_EXPERIMENTAL=1 for OCI referrer mode in cosign sign ([#543](#543)) ([226ed2f](226ed2f)) * **cli:** switch cosign verification from .sig tags to OCI referrers ([#533](#533)) ([8ee5471](8ee5471)), closes [#532](#532) ### CI/CD * bump wrangler from 4.74.0 to 4.75.0 in /.github in the minor-and-patch group ([#535](#535)) ([de15867](de15867)) ### Maintenance * bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 in /cli in the minor-and-patch group ([#536](#536)) ([4a09aed](4a09aed)) * bump litellm from 1.82.3 to 1.82.4 in the minor-and-patch group ([#538](#538)) ([9f7f83d](9f7f83d)) * bump vue-tsc from 3.2.5 to 3.2.6 in /web in the minor-and-patch group across 1 directory ([#537](#537)) ([eb3dc4e](eb3dc4e)) * **main:** release 0.3.3 ([#539](#539)) ([c3de2a2](c3de2a2)) * revert v0.3.3 release artifacts (Docker signing failed) ([#544](#544)) ([7f48f52](7f48f52)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
sandbox/factory.py) with three functions:build_sandbox_backends(instantiates only needed backends),resolve_sandbox_for_category(per-category backend lookup),cleanup_sandbox_backends(parallel cleanup via TaskGroup with error isolation)build_default_tools_from_configwith 3-tier priority: explicitsandbox> explicitsandbox_backendsmap > auto-build fromconfig.sandboxingVERSION_CONTROLcategory (git tools); other categories will be wired as their tool builders are added__init__.pyexports, update CLAUDE.md Package StructureCloses #265
Test plan
build_sandbox_backends(7 tests: subprocess-only, docker-only, mixed, sub-configs, deduplication, MappingProxyType return)resolve_sandbox_for_category(4 tests: override, default fallback, all ToolCategory parametrized, KeyError on missing backend)cleanup_sandbox_backends(4 tests: all backends, empty, single, error isolation)Review coverage