Summary
OpenShell currently accepts clearly malformed/generated shell commands and hands them to /bin/sh; repeated identical failed tool calls are only warned on rather than hard-aborted. In a chat-surface session this can create a long-running “typing/processing” wedge when a model repeatedly emits malformed commands derived from executable-looking placeholder text.
Observed behaviour
During a chat session using openai-codex/gpt-5.5 as the default model, the assistant repeatedly attempted malformed shell/tool commands. The runtime passed these through far enough that the turn did not fail fast. The loop detector warned after repeated identical tool calls, but did not abort the turn.
Examples of command patterns that should be rejected before shell execution:
- literal placeholder tokens in executable positions or args, e.g.
<name>, <workflow-id>
- unclosed quotes
- trailing escapes
- unterminated backticks / command substitutions
One contributing source was bundled skill/help text containing executable-looking placeholder examples such as workflow install <name> and workflow run <workflow-id> "<task>". The model copied the placeholders literally.
Expected behaviour
OpenShell should fail closed before invoking /bin/sh when the command contains obviously non-runnable generated placeholders or malformed shell syntax.
Repeated identical tool calls in a single turn should hard-abort the turn after the configured threshold rather than only warning, especially when the repeated call is failing or malformed.
Why this matters
This is not model-specific. GPT-5.5 made the issue visible, but the runtime should guard against any model emitting placeholder/syntax-garbage commands. Without fail-fast behaviour, a chat surface can appear wedged and keep showing a long-running processing state.
Local mitigation tested
A local runtime patch added preflight rejection for:
- literal
<placeholder> tokens
- unclosed quotes
- trailing backslash escapes
- unterminated backticks
and changed repeated identical tool-call detection from warn-only to hard-abort. After restart, openclaw health --json returned live/ready and chat replies recovered.
Suggested fix
- Add OpenShell preflight validation before shell invocation for generated-command hazards.
- Treat repeated identical tool calls as a hard turn-abort after threshold, with a clear user-visible/tool-visible error.
- Consider linting bundled skills/help examples so placeholder commands are not presented in a copy-executable form unless clearly quoted as non-runnable syntax.
Environment
- OpenClaw installed via package manager
- Host OS: macOS arm64
- Default model at time:
openai-codex/gpt-5.5
Summary
OpenShell currently accepts clearly malformed/generated shell commands and hands them to
/bin/sh; repeated identical failed tool calls are only warned on rather than hard-aborted. In a chat-surface session this can create a long-running “typing/processing” wedge when a model repeatedly emits malformed commands derived from executable-looking placeholder text.Observed behaviour
During a chat session using
openai-codex/gpt-5.5as the default model, the assistant repeatedly attempted malformed shell/tool commands. The runtime passed these through far enough that the turn did not fail fast. The loop detector warned after repeated identical tool calls, but did not abort the turn.Examples of command patterns that should be rejected before shell execution:
<name>,<workflow-id>One contributing source was bundled skill/help text containing executable-looking placeholder examples such as
workflow install <name>andworkflow run <workflow-id> "<task>". The model copied the placeholders literally.Expected behaviour
OpenShell should fail closed before invoking
/bin/shwhen the command contains obviously non-runnable generated placeholders or malformed shell syntax.Repeated identical tool calls in a single turn should hard-abort the turn after the configured threshold rather than only warning, especially when the repeated call is failing or malformed.
Why this matters
This is not model-specific. GPT-5.5 made the issue visible, but the runtime should guard against any model emitting placeholder/syntax-garbage commands. Without fail-fast behaviour, a chat surface can appear wedged and keep showing a long-running processing state.
Local mitigation tested
A local runtime patch added preflight rejection for:
<placeholder>tokensand changed repeated identical tool-call detection from warn-only to hard-abort. After restart,
openclaw health --jsonreturned live/ready and chat replies recovered.Suggested fix
Environment
openai-codex/gpt-5.5