Skip to content

OpenShell should fail fast on malformed generated commands and hard-abort repeated tool loops #72373

@Kaspnov

Description

@Kaspnov

Summary

OpenShell currently accepts clearly malformed/generated shell commands and hands them to /bin/sh; repeated identical failed tool calls are only warned on rather than hard-aborted. In a chat-surface session this can create a long-running “typing/processing” wedge when a model repeatedly emits malformed commands derived from executable-looking placeholder text.

Observed behaviour

During a chat session using openai-codex/gpt-5.5 as the default model, the assistant repeatedly attempted malformed shell/tool commands. The runtime passed these through far enough that the turn did not fail fast. The loop detector warned after repeated identical tool calls, but did not abort the turn.

Examples of command patterns that should be rejected before shell execution:

  • literal placeholder tokens in executable positions or args, e.g. <name>, <workflow-id>
  • unclosed quotes
  • trailing escapes
  • unterminated backticks / command substitutions

One contributing source was bundled skill/help text containing executable-looking placeholder examples such as workflow install <name> and workflow run <workflow-id> "<task>". The model copied the placeholders literally.

Expected behaviour

OpenShell should fail closed before invoking /bin/sh when the command contains obviously non-runnable generated placeholders or malformed shell syntax.

Repeated identical tool calls in a single turn should hard-abort the turn after the configured threshold rather than only warning, especially when the repeated call is failing or malformed.

Why this matters

This is not model-specific. GPT-5.5 made the issue visible, but the runtime should guard against any model emitting placeholder/syntax-garbage commands. Without fail-fast behaviour, a chat surface can appear wedged and keep showing a long-running processing state.

Local mitigation tested

A local runtime patch added preflight rejection for:

  • literal <placeholder> tokens
  • unclosed quotes
  • trailing backslash escapes
  • unterminated backticks

and changed repeated identical tool-call detection from warn-only to hard-abort. After restart, openclaw health --json returned live/ready and chat replies recovered.

Suggested fix

  1. Add OpenShell preflight validation before shell invocation for generated-command hazards.
  2. Treat repeated identical tool calls as a hard turn-abort after threshold, with a clear user-visible/tool-visible error.
  3. Consider linting bundled skills/help examples so placeholder commands are not presented in a copy-executable form unless clearly quoted as non-runnable syntax.

Environment

  • OpenClaw installed via package manager
  • Host OS: macOS arm64
  • Default model at time: openai-codex/gpt-5.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions