Skip to content

fix(agents): strip images on cross-provider fallback retry#43485

Closed
mitchmcalister wants to merge 12 commits intoopenclaw:mainfrom
mitchmcalister:fix/cross-provider-fallback-image-privacy
Closed

fix(agents): strip images on cross-provider fallback retry#43485
mitchmcalister wants to merge 12 commits intoopenclaw:mainfrom
mitchmcalister:fix/cross-provider-fallback-image-privacy

Conversation

@mitchmcalister
Copy link
Copy Markdown
Contributor

Summary

  • Strip user-supplied images when a fallback retry targets a different provider than the primary, preventing unintended third-party disclosure
  • Preserves existing behavior: same-provider format errors still strip images, same-provider non-format retries still preserve them
  • Adds 4 new test cases covering cross-provider and same-provider scenarios

Closes #43481
Related: #43331 (parent behavioral fix that surfaced this during security review)

Test plan

  • pnpm test -- src/commands/agent.test.ts — all resolveRetryImages tests pass (4 new + 6 existing)
  • pnpm format — clean
  • CI green

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations agents Agent runtime and tooling size: M labels Mar 11, 2026
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Mar 11, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Cross-provider model fallback resends full original prompt, causing unintended third-party disclosure
2 🟡 Medium Model fallback retry re-sends original prompt, risking duplicate side-effectful tool execution

1. 🟠 Cross-provider model fallback resends full original prompt, causing unintended third-party disclosure

Property Value
Severity High
CWE CWE-201
Location src/commands/agent.ts:364-367

Description

In runAgentAttempt, fallback retries now always reuse the original user prompt (params.body) as the prompt sent to the model.

This creates a privacy regression when the fallback selects a different provider (cross-provider failover):

  • Input: user prompt text body (may contain sensitive data)
  • Control flow: agentCommandInternal uses runWithModelFallback(...) which can choose cross-provider fallback candidates
  • Sink: runEmbeddedPiAgent({ prompt: effectivePrompt, provider: params.providerOverride, ... })
  • On fallback retries (attempt index > 0), the code always sends the full original prompt to the fallback provider/model

Previously, fallback retries used a generic continuation string ("Continue where you left off...") which reduced disclosure of the full original prompt to a different provider.

Vulnerable code (cross-provider fallback will still receive the full prompt):

// ...
const effectivePrompt = params.body;// ...
return runEmbeddedPiAgent({// ...
  prompt: effectivePrompt,
  provider: params.providerOverride,
  model: params.modelOverride,// ...
});

Even though images are conditionally stripped on cross-provider fallback via resolveRetryImages(...), the text prompt itself is always resent, which can disclose sensitive text to an unintended third-party provider (e.g., local-first primary like Ollama failing over to a hosted provider).

Recommendation

Preserve the reduced-disclosure behavior for cross-provider fallback retries, or require explicit opt-in.

Suggested fix: only resend the original prompt on fallback when the provider is unchanged; otherwise send a generic continuation prompt (or abort unless the user explicitly allows cross-provider disclosure).

Example:

const isCrossProvider =
  normalizeProviderId(params.primaryProvider) !== normalizeProviderId(params.providerOverride);

const effectivePrompt =
  params.isFallbackRetry && isCrossProvider
    ? "Continue where you left off. The previous model attempt failed or timed out."
    : params.body;

Additionally:

  • Consider emitting a user-visible warning/confirmation when a run is about to fall back to a different provider.
  • Document cross-provider fallback disclosure semantics (prompt and attachments).

2. 🟡 Model fallback retry re-sends original prompt, risking duplicate side-effectful tool execution

Property Value
Severity Medium
CWE CWE-840
Location src/commands/agent.ts:364-367

Description

In runAgentAttempt, fallback retries now always re-send the original prompt body (params.body) instead of using a “continue” prompt.

This creates a replay risk when a fallback retry happens after the first attempt already executed tools or produced assistant output, but then throws a FailoverError (triggering cross-model/provider fallback).

Why this is security-relevant:

  • The embedded runner (runEmbeddedAttempt in attempt.ts) executes tool calls (e.g., exec/filesystem/messaging) during activeSession.prompt(...).
  • runEmbeddedPiAgent can throw FailoverError for fallback after an attempt completes with an assistant-side failover error (not only “prompt submission failed before any output”).
  • On the next fallback attempt, re-sending the same prompt as a new user turn can cause the model to re-issue the same tool calls, repeating side effects (duplicate outbound messages, repeated shell commands, repeated edits/writes).

The code comment claims orphaned-user-message repair prevents duplicates, but that repair only removes a trailing orphaned user message when the last persisted entry is role === "user" (i.e., no assistant message yet). It does not prevent duplicates when the prior attempt already persisted an assistant message/tool activity.

Vulnerable change (replay trigger):

// Fallback retries only fire on thrown errors...
const effectivePrompt = params.body;

Relevant supporting behavior:

  • Orphan repair only for trailing user leaf (doesn’t cover non-orphan duplicates): src/agents/pi-embedded-runner/run/attempt.ts around 1693–1707.
  • Fallback can be triggered by thrown FailoverError after assistant-side failures: src/agents/pi-embedded-runner/run.ts throws in the assistant failover path (around 1385–1420) and prompt failover path (around 1266–1280).

Recommendation

Add replay/idempotency protection for fallback retries, because a thrown error can occur after tools have already executed.

Options (pick one or combine):

  1. Do not create a new user turn on fallback retry when the previous attempt already persisted the user turn.

    • Detect the last user message in the session and avoid appending a duplicate on retry.
    • Alternatively, add a retry mode in the runner that reuses the existing user turn and only appends a new assistant turn.
  2. Tool-execution ledger / idempotency keys per run:

    • Persist a per-run idempotency key (e.g., runId) and pass it into side-effectful tools.
    • Tools should refuse to re-execute identical operations for the same runId (or prompt-hash + runId), or at minimum require re-confirmation.

Example direction (pseudo):

// before executing a side-effectful tool
const key = `${runId}:${toolName}:${stableArgsHash}`;
if (ledger.has(key)) return ledger.get(key); // return cached result
ledger.set(key, result);
  1. If keeping the “re-send original prompt” behavior, add an explicit retry guard message to the model (system/tool context) like “This is a retry; do not repeat already-completed actions; inspect the conversation/tool results first.” (This is weaker than (1)/(2) but still helps.)

Analyzed PR: #43485 at commit d6bc239

Last updated on: 2026-03-11T22:43:56Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 11, 2026

Greptile Summary

This PR addresses a security gap where user-supplied images were being forwarded to third-party providers during cross-provider fallback retries. It also fixes a related UX regression where fallback retries were replacing the user's original prompt with a generic "Continue where you left off" string.

Key changes:

  • Introduces resolveRetryImages in agent.ts, which strips images only when necessary: always on cross-provider retries (different primaryProvider vs currentProvider), and on same-provider retries only when the prior failure reason was "format". All other same-provider retries now correctly preserve images.
  • Adds previousFailureReason to ModelFallbackRunOptions in model-fallback.ts and threads it through the fallback loop so each retry attempt knows why the previous attempt failed.
  • Removes the resolveFallbackRetryPrompt function, which was substituting the user's original message with a generic string on every retry; the original prompt is now always re-sent (comment in the code explains this is safe because fallback retries only fire on thrown errors, not on partial streaming results, and orphaned-user-message repair handles any duplicate turns).
  • Adds comprehensive unit tests for resolveRetryImages and a previousFailureReason forwarding suite in the model-fallback tests. The integration test for prompt forwarding is a good addition, though it does not include images in the call, leaving the end-to-end image-stripping path without an integration-level assertion (covered by the new unit tests for resolveRetryImages).

Confidence Score: 4/5

  • This PR is safe to merge; the security fix is correct and unit-tested, with a minor gap in integration-test coverage for the image-stripping path.
  • The core logic in resolveRetryImages is correct: cross-provider retries unconditionally strip images, same-provider format errors strip images, and all other same-provider retries preserve them. The previousFailureReason forwarding mechanism in model-fallback.ts is also sound — it is correctly initialised as undefined on the first attempt and only populated after a failure. The prompt-preservation change is well-justified and documented. The main reason for not scoring 5 is that the integration test for prompt forwarding omits images, so there is no end-to-end assertion confirming that images are stripped through the full agentCommand → runAgentAttempt → runEmbeddedPiAgent call chain for the security-critical cross-provider path.
  • No files require special attention beyond the noted test coverage gap in src/commands/agent.test.ts.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/commands/agent.test.ts
Line: 1041-1048

Comment:
**Missing test: cross-provider strip when `previousFailureReason` is `undefined`**

The existing unit tests confirm that cross-provider stripping fires when a `previousFailureReason` is set (e.g., `"rate_limit"`, `"timeout"`). However, there is no test for the case where providers differ but `previousFailureReason` is `undefined`. Reading `resolveRetryImages`, the provider check runs first and is independent of `previousFailureReason`, so the function would still return `undefined`; but adding an explicit case makes the independence explicit and guards against a future refactor that accidentally gates the provider check on `previousFailureReason` being set.

```ts
it("strips images on cross-provider fallback even when previousFailureReason is undefined", () => {
  expect(
    resolveRetryImages(fakeImages, true, undefined, "anthropic", "openai"),
  ).toBeUndefined();
});
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/commands/agent.test.ts
Line: 597-651

Comment:
**Integration test doesn't verify image stripping on cross-provider retry**

The test exercises the cross-provider retry path (`anthropic``openai/gpt-5.2`) and correctly verifies prompt forwarding, but `agentCommand` is called without any `images`, so there is no integration-level assertion that `resolveRetryImages` is actually wired up and stripping images on this path.

Consider passing a `fakeImages` array in the `agentCommand` call and asserting that the first call receives the original images while the second call receives `undefined`. The unit tests for `resolveRetryImages` cover the pure-function behaviour thoroughly, but this integration path (`agentCommand``runAgentAttempt``runEmbeddedPiAgent`) is the security-critical one and currently has no assertion to confirm the plumbing works end-to-end.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: a608220

Comment thread src/commands/agent.test.ts
Comment thread src/commands/agent.test.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6082200e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/commands/agent.ts
@mitchmcalister
Copy link
Copy Markdown
Contributor Author

Addressing Aisle Security Findings

Finding 1 (High) — Cross-provider prompt forwarding

This is an architectural property of the fallback system, not a regression from this PR. Cross-provider fallbacks exist so that when the primary provider fails, the retry can succeed on a different provider — which inherently requires sending the prompt. If operators configure cross-provider fallbacks (e.g. anthropic primary with openai fallback), they are opting into that cross-provider data flow.

Stripping or replacing the prompt on cross-provider retry (as suggested) would make the retry non-functional — the fallback model would have no context to work with.

If there's appetite for a config gate like allowCrossProviderPrompt, that's a separate design discussion about the fallback system's trust model, not something to scope into a targeted image-stripping fix. Per CONTRIBUTING.md: "Keep PRs focused (one thing per PR; do not mix unrelated concerns)."

Finding 2 (Medium) — Prompt-detected images bypass

Already tracked in #43492, filed from review feedback on this PR. The detectAndLoadPromptImages code path in attempt.ts is independent of the images parameter this PR addresses.

@mitchmcalister
Copy link
Copy Markdown
Contributor Author

Converting to draft — this will be superseded by a unified approach that addresses the cross-provider image privacy concern at the session level rather than patching resolveRetryImages in isolation.

See #43331 (comment) for context on the rethink.

mitchmcalister and others added 12 commits March 12, 2026 15:45
Delete no-op resolveFallbackRetryPrompt (inline body directly), simplify
resolveRetryImages to positional args with a single guard, and replace
the effectiveOptions IIFE with a conditional spread one-liner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a fallback retry targets a different provider than the primary,
strip user-supplied images to avoid unintended third-party disclosure.

Closes openclaw#43481
Adds unit test for cross-provider strip with undefined failureReason,
and integration test verifying images are plumbed through agentCommand
and stripped on cross-provider fallback retry end-to-end.
@mitchmcalister
Copy link
Copy Markdown
Contributor Author

Superseded by #44188 (unified fallback retry safety PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-provider fallback should respect provider trust boundaries for images/attachments

1 participant