Summary
When an agent is given a task that includes code blocks intended to be written to files, it sometimes executes the code block contents directly as shell commands instead of creating the files first. This causes cascading failures as Python syntax is interpreted by the shell.
Observed behaviour
Given a prompt like:
Create sentinel/remediation/actions.py with this content:
from dataclasses import dataclass
from enum import Enum
class RemediationStatus(str, Enum):
PENDING = 'pending'
The agent ran the code directly in the shell:
zsh:1: command not found: python
zsh:2: command not found: from
zsh:3: command not found: from
zsh:4: command not found: from
zsh:7: no matches found: RemediationStatus(str, Enum):
zsh:8: command not found: PENDING
Instead of writing the content to the file using a write tool call or cat > file.py << 'EOF' heredoc.
Root cause hypothesis
The agent conflates "show the code" with "run the code". When the prompt contains a code block adjacent to a file path, the agent should infer the intent is file creation — not shell execution. This is especially likely when:
- The file extension is
.py, .ts, .yaml, etc. (not a shell script)
- The code contains Python/TS syntax that is obviously not valid shell
- The surrounding context says "create" or "implement" rather than "run"
Expected behaviour
The agent should:
- Recognise that a
python block adjacent to a file path is file content, not a command
- Use the
write tool (or equivalent) to create the file
- Only execute shell commands when the intent is clearly execution (e.g.
run, execute, test)
- When in doubt, prefer writing to a file over running as a command — the failure mode of writing is recoverable; the failure mode of running arbitrary Python as shell is catastrophic
Suggested fix
Add a pre-execution heuristic: if the command contains Python/TS/YAML syntax tokens (import, from, def, class, interface, :at line end, etc.) and does not start with a known shell keyword or binary, warn or refuse to execute and suggest writing to a file instead.
Alternatively, improve the system prompt guidance to explicitly distinguish between "write file" and "execute command" intents when code blocks are present.
Repro context
- Agent: Claude Sonnet 4.6 via OpenClaw main session
- Task: Implement a multi-file Python module from a structured prompt
- Surface: Discord channel (exec via
claude -p --dangerously-skip-permissions subprocess)
- Prompt style: Inline code blocks with file paths and class definitions
Impact
High — causes complete task failure and requires human intervention to restart. Particularly damaging in multi-file implementation tasks where a single misfire aborts the entire sequence.
Summary
When an agent is given a task that includes code blocks intended to be written to files, it sometimes executes the code block contents directly as shell commands instead of creating the files first. This causes cascading failures as Python syntax is interpreted by the shell.
Observed behaviour
Given a prompt like:
The agent ran the code directly in the shell:
Instead of writing the content to the file using a
writetool call orcat > file.py << 'EOF'heredoc.Root cause hypothesis
The agent conflates "show the code" with "run the code". When the prompt contains a code block adjacent to a file path, the agent should infer the intent is file creation — not shell execution. This is especially likely when:
.py,.ts,.yaml, etc. (not a shell script)Expected behaviour
The agent should:
pythonblock adjacent to a file path is file content, not a commandwritetool (or equivalent) to create the filerun,execute,test)Suggested fix
Add a pre-execution heuristic: if the command contains Python/TS/YAML syntax tokens (
import,from,def,class,interface,:at line end, etc.) and does not start with a known shell keyword or binary, warn or refuse to execute and suggest writing to a file instead.Alternatively, improve the system prompt guidance to explicitly distinguish between "write file" and "execute command" intents when code blocks are present.
Repro context
claude -p --dangerously-skip-permissionssubprocess)Impact
High — causes complete task failure and requires human intervention to restart. Particularly damaging in multi-file implementation tasks where a single misfire aborts the entire sequence.