fix(agent,docker): halt loop on repeated tool errors + forward docker env/args#35937
Open
jsplec wants to merge 3 commits into
Open
fix(agent,docker): halt loop on repeated tool errors + forward docker env/args#35937jsplec wants to merge 3 commits into
jsplec wants to merge 3 commits into
Conversation
Collaborator
…ode-execution tools
terminal_tool._create_environment() reads docker_env and
docker_extra_args from the container_config dict it is handed, but the
two other tools that can create the shared per-task container build that
dict without those keys:
- tools/file_tools.py (_get_file_ops)
- tools/code_execution_tool.py (_get_or_create_env)
The container is created lazily by whichever tool runs first. If a file
or code-execution op wins the race, the container comes up with
docker_env={} and docker_extra_args=[], silently discarding the user's
configured proxy env, --network, --cap-drop, --security-opt and other
run flags for that container's lifetime. The result is order-dependent
and non-deterministic.
Forward both keys from file_tools and code_execution_tool so all three
container-creating paths build an identical container_config, matching
terminal_tool. No behavior change when the keys are unset.
terminal_tool._get_env_config() reads every terminal setting from TERMINAL_* env vars, so config.yaml values must be bridged into env vars at startup by cli.py (env_mappings) and gateway/run.py (_terminal_env_map). Both maps bridged docker_volumes, docker_env, docker_run_as_host_user and friends but omitted docker_extra_args, even though _get_env_config has always read TERMINAL_DOCKER_EXTRA_ARGS. So terminal.docker_extra_args in config.yaml was silently ignored at runtime: the value defaulted to [] regardless of what the user configured. Impact: a configured `--network=...`, extra `--cap-drop`, etc. never reached the container. Observed in a hardened web-profile fetcher meant to run on an isolated `--network=fetcher-net` that came up on the default bridge instead. Add docker_extra_args to both bridge maps and pin it with test_docker_extra_args_is_bridged_everywhere, mirroring the existing docker_env / docker_run_as_host_user / docker_mount_cwd_to_workspace regression guards. Same bug class as those keys. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a model calls a tool with invalid args (e.g. memory replace without old_text), the tool returns a structured error but the model retries with the same bad call indefinitely, burning the full iteration budget at full context cost per retry. Add a consecutive-identical-error circuit breaker: after the same (tool_name, error_fingerprint) pair appears 3 times in one user turn, set _tool_error_loop_halt and break the conversation loop — same pattern as _tool_guardrail_halt_decision. Streak resets on tool success or at the start of each new user turn. Observed impact: memory replace loop consumed ~44k-token context per retry and ran until killed rather than halting after 3 failures.
4ee8a7d to
1419cda
Compare
This was referenced Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Docker: forward
docker_env/docker_extra_args(3c8c603,e80104a)terminal_tool._create_environment()readsdocker_envanddocker_extra_argsfrom thecontainer_configdict it is handed. But the two other tools that can create the shared per-task container build that dict without those keys:tools/file_tools.py(_get_file_ops)tools/code_execution_tool.py(_get_or_create_env)The container is created lazily by whichever tool runs first (via
_resolve_container_task_id). If a file or code-execution op wins the race, the container comes up withdocker_env={}anddocker_extra_args=[], silently discarding the user-configured proxy environment,--network,--cap-drop,--security-optand any other docker run flags for that container's lifetime. The result is order-dependent and non-deterministic from the user's perspective.Fix
Add
docker_envanddocker_extra_argsto thecontainer_configdict in bothfile_tools.pyandcode_execution_tool.pyso all three container-creating paths build an identical config, matchingterminal_tool._create_environment().Impact
Only affects docker-backend profiles that set
docker_env/docker_extra_args. No behavior change when those are unset (defaults{}/[]). Thelocalbackend is unaffected.2. Agent: halt loop on repeated identical tool errors — circuit breaker (
1419cda)The agent could get stuck calling the same tool with the same malformed arguments in a loop — e.g.
memory(action=replace)withoutold_text— failing identically every iteration and burning the entire iteration budget without making progress.Fix
A 3-strike circuit breaker modeled on the existing
_tool_guardrail_halt_decisionpattern. When the same(tool_name, error_text[:120])key fails 3 consecutive times, the conversation loop halts instead of spinning.Files:
agent/agent_init.py— adds_tool_error_loop_haltand_consecutive_tool_error_streakattributesagent/conversation_loop.py— per-turn reset of the streak + halt check after_execute_tool_callsagent/tool_executor.py— streak tracking: identical(tool_name, error_text[:120])key ≥ 3 → sets the halt flagImpact
Only triggers on genuinely repeated identical failures. Normal tool use (including the same tool succeeding, or failing with different errors) resets the streak and is unaffected.