fix: fail fast on qa live auth errors by shakkernerd · Pull Request #63333 · openclaw/openclaw

shakkernerd · 2026-04-08T19:26:15Z

Summary

Problem: live QA scenario flows could hide real provider auth failures behind long generic timeouts, even when the underlying gateway had already produced an immediate actionable error.
Why it matters: this made live QA runs much slower to debug and much harder to trust, because provider setup failures looked like scenario timeouts instead of the real root cause.
What changed: QA now treats known fallback and error replies as terminal failures across both outbound-message waits and raw scenario condition waits, and external missing-API-key failures now preserve direct auth guidance instead of collapsing to the generic fallback reply.
What did NOT change (scope boundary): this does not change model auth itself; it fixes how QA surfaces those failures in scenario flows.

Change Type

Bug fix
Test

Root Cause

Root cause: the live QA scenario path accepted the channel fallback reply as ordinary outbound text, or kept polling success-specific conditions without checking whether a classified failure reply had already arrived.
Contributing context: the underlying gateway already knew the real error immediately, but the scenario flow surfaced it too late and too generically.

User-visible Changes

Live QA suite runs now fail fast when the channel reply is an obvious runtime failure reply.
That fail-fast wiring now covers both success-specific outbound waits and raw scenario waitForCondition(...) checks.
Missing API key failures now preserve direct provider guidance instead of only showing Something went wrong while processing your request.
Repeated auth failure text like x | x is now collapsed into one clean message in the final QA report.

Why This Matters

This bug was first captured clearly through Kova.

Kova showed a sharp mismatch between two otherwise similar live runs:

openai/gpt-5.4 failed after a long timeout
openai-codex/gpt-5.4 passed quickly on the same machine

Example Kova evidence from the investigation:

candidate: openai/gpt-5.4
run: kova_20260408_183035268_cc484f
verdict: fail
duration: 380.9s
detail: timed out after 360000ms

candidate: openai-codex/gpt-5.4
run: kova_20260408_185441906_df9a53
verdict: pass
duration: 65.0s
judge score: 8.8

That made the bug obvious:

provider auth was not generically broken
the live QA scenario flow was hiding the real auth failure

Verification

Targeted tests

pnpm test extensions/qa-lab/src/reply-failure.test.ts extensions/qa-lab/src/suite.test.ts src/auto-reply/reply/agent-runner-execution.test.ts

Live repro before the fix

Direct manual lane already failed fast with the correct auth message:
- pnpm openclaw qa manual --provider-mode live-frontier --model openai/gpt-5.4 --message 'Reply exactly: OPENAI_OK' --timeout-ms 30000
But the scenario lane could stall and surface a timeout instead of the auth failure.

Live repro after the fix

pnpm openclaw qa suite --provider-mode live-frontier --model openai/gpt-5.4 --alt-model openai/gpt-5.4 --scenario memory-recall --output-dir .artifacts/qa-e2e/repro-auth-timeout-memory-recall

Result:

suite finished in about 6.8s
scenario failed immediately with:
- agent.wait returned error: FailoverError: No API key found for provider "openai". You are authenticated with OpenAI Codex OAuth. Use openai-codex/gpt-5.4 (OAuth) or set OPENAI_API_KEY to use openai/gpt-5.4.

Snippet from the generated QA report:

### Memory recall after context switch

- Status: fail
- Details: agent.wait returned error: FailoverError: No API key found for provider "openai". You are authenticated with OpenAI Codex OAuth. Use openai-codex/gpt-5.4 (OAuth) or set OPENAI_API_KEY to use openai/gpt-5.4.

Files

extensions/qa-lab/src/reply-failure.ts
extensions/qa-lab/src/reply-failure.test.ts
extensions/qa-lab/src/suite.ts
extensions/qa-lab/src/suite.test.ts
src/auto-reply/reply/agent-runner-execution.ts
src/auto-reply/reply/agent-runner-execution.test.ts

Risks and Mitigations

Risk: QA could now fail faster on replies that are intentional warning text.
Mitigation: the failure classifier is intentionally narrow and only matches known runtime fallback and error reply patterns.

aisle-research-bot · 2026-04-08T19:26:22Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Outbound assistant text propagated into thrown Errors and QA reports (potential secret leakage)
2	🟡 Medium	User-facing replies disclose gateway auth/config state via missing-API-key errors

1. 🟡 Outbound assistant text propagated into thrown Errors and QA reports (potential secret leakage)

Property	Value
Severity	Medium
CWE	CWE-532
Location	`extensions/qa-lab/src/suite.ts:194-202`

Description

waitForOutboundMessage / waitForCondition in extensions/qa-lab/src/suite.ts now treat certain assistant replies as failures and throw Error(...) using the full outbound message text.

This error is then:

converted to a string via formatErrorMessage(error) in runScenario() and stored as details
rendered verbatim into the generated Markdown report via renderQaMarkdownReport() (code fences preserve the full content)
optionally printed to stderr when OPENCLAW_QA_DEBUG=1

If an outbound assistant message contains sensitive data (e.g., tokens accidentally echoed by the model, tool outputs, internal URLs, credentials included in scenario prompts, etc.), that content will be persisted into CI artifacts/logs.

Vulnerable code:

if (failureMessage) {
  throw new Error(extractQaFailureReplyText(failureMessage.text) ?? failureMessage.text);
}

Recommendation

Avoid embedding full assistant text into exception messages that are later logged or persisted.

Options:

Throw a structured error with a stable code, and redact/limit the text included in .message:

class QaFailureReplyError extends Error {
  constructor(public readonly failureKind: string) {
    super(`QA failure reply detected: ${failureKind}`);
    this.name = "QaFailureReplyError";
  }
}

// ...
const failureText = extractQaFailureReplyText(failureMessage.text);
if (failureText) {
  // keep full text only in memory if needed, but do not log/persist by default
  throw new QaFailureReplyError(failureText.slice(0, 80));
}

If the full text is needed for debugging, store it in a separate field that is only written when an explicit debug flag is enabled, and/or redact obvious secret patterns before writing to reports.
Add a maximum length cap to any logged/persisted model output (e.g., first 200 chars).

2. 🟡 User-facing replies disclose gateway auth/config state via missing-API-key errors

Property	Value
Severity	Medium
CWE	CWE-209
Location	`src/auto-reply/reply/agent-runner-execution.ts:334-349`

Description

buildExternalRunFailureText() converts upstream errors like No API key found for provider "..." into end-user facing reply text.

The reply explicitly reveals that the backend is using a gateway and that it lacks an API key for a specific provider.
For OpenAI it additionally advises setting OPENAI_API_KEY or using a specific OAuth model string, which leaks internal configuration details and can confuse/assist untrusted users.
These messages are returned on the normal external reply path when shouldSurfaceToControlUi is false (i.e., typical end-user channels), meaning an untrusted user may be able to trigger them (by selecting/forcing a provider) to infer provider availability and the gateway’s auth state.

Vulnerable code:

const SAFE_MISSING_API_KEY_PROVIDERS = new Set(["anthropic", "google", "openai", "openai-codex"]);

function buildMissingApiKeyFailureText(message: string): string | null {
  const providerMatch = normalizedMessage.match(/No API key found for provider "([^"]+)"/u);
  const provider = providerMatch?.[1]?.trim().toLowerCase();
  ...
  if (provider === "openai" && normalizedMessage.includes("OpenAI Codex OAuth")) {
    return "⚠️ Missing API key for OpenAI on the gateway. Use `openai-codex/gpt-5.4` for OAuth, or set `OPENAI_API_KEY`, then try again.";
  }
  if (SAFE_MISSING_API_KEY_PROVIDERS.has(provider)) {
    return `⚠️ Missing API key for provider "${provider}". Configure the gateway auth for that provider, then try again.`;
  }
}

Recommendation

Only surface detailed provider/auth configuration guidance in admin/control UI contexts.

For end-user channels, return a generic message (no provider names, no env var names), and log the detailed error server-side.
If you still want to help legitimate users, gate the detailed message behind an explicit admin/debug flag.

Example:

function buildMissingApiKeyFailureText(message: string, opts: { admin: boolean }): string | null {
  const provider = ...;
  if (!provider) return null;

  if (!opts.admin) {
    return "⚠️ Model configuration error. Please contact an administrator.";
  }

  return `⚠️ Gateway is missing credentials for provider "${provider}". Configure gateway auth and retry.`;
}

Analyzed PR: #63333 at commit 00bace5

_{Last updated on: 2026-04-08T21:02:26Z}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 342dde197f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

greptile-apps · 2026-04-08T19:33:27Z

Greptile Summary

This PR adds fail-fast behavior to live QA scenario flows when the gateway returns a known auth/runtime error reply, and ensures that "No API key found for provider" errors preserve their direct user guidance instead of collapsing to the generic fallback message. The changes are narrow and well-targeted: a new extractQaFailureReplyText classifier in extensions/qa-lab/src/reply-failure.ts, a hook into waitForOutboundMessage in suite.ts, a collapseRepeatedFailureDetail deduplication helper, and a targeted override in buildExternalRunFailureText for the missing-API-key case.

Confidence Score: 5/5

Safe to merge — logic is correct, tests cover the key behaviors, and all remaining observations are P2.

No P0 or P1 issues. The classifier, deduplication helper, and buildExternalRunFailureText override are all correct. The only note is a design limitation (predicate-gated fail-fast) that is documented in the PR description and does not regress any existing behavior.

No files require special attention.

Vulnerabilities

No security concerns identified. The buildExternalRunFailureText change does forward a gateway-produced error string to the user reply, but provider names in that path are controlled by the system config, not user input. The failure classifier in reply-failure.ts is read-only and purely for error detection.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/qa-lab/src/suite.ts
Line: 184-187

Comment:
**Fail-fast is predicate-gated**

The failure check only fires when a message already matched `predicate`. A scenario step that waits for a specific reply text (e.g. `msg.text.includes("OPENAI_OK")`) won't short-circuit on an auth error — the error message won't satisfy that predicate, so the step still times out. This is a known design boundary (acknowledged in the PR), but worth a brief inline note so future scenario authors understand why their narrow-predicate steps can still time out on auth failures even after this fix.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix: fail fast on qa live auth errors" | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97732e7469

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a719f9231

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T20:23:09Z

+}
+
+function createScenarioWaitForCondition(state: QaBusState) {
+  const sinceIndex = state.getSnapshot().messages.length;


Recompute failure cursor after scenario bus resets

createScenarioWaitForCondition snapshots sinceIndex once from the pre-flow message length and reuses it for every poll. Several flows invoke reset() after API creation (for example qa/scenarios/model-switch-tool-continuity.md), which clears state.messages; with the stale pre-reset index, findFailureOutboundMessage slices past new outbound failures and the scenario can fall through to a timeout instead of failing fast on the auth/error reply.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ce39b7ddf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T20:33:18Z

+}
+
+function createScenarioWaitForCondition(state: QaBusState) {
+  const sinceIndex = state.getSnapshot().messages.length;


Recompute failure cursor after bus resets

createScenarioWaitForCondition snapshots sinceIndex once from state.getSnapshot().messages.length, but scenario flows frequently call reset() after API creation; reset() clears messages, so this stale pre-reset index points past newly-added outbound errors. In that case findFailureOutboundMessage(..., { cursorSpace: "all" }) slices away fresh failure replies and the wait falls through to a timeout instead of failing fast on the real auth/error response.

Useful? React with 👍 / 👎.

…rnerd

…rnerd)

shakkernerd · 2026-04-08T20:56:11Z

Landed via rebase.

Prepared head SHA: 00bace5
Merge commit: 540fcd4
Local verification: focused QA/auth Vitest targets passed, pnpm build passed, and pnpm tsgo still reproduced the unrelated baseline failure already present on latest main.

Thanks @shakkernerd!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00bace5815

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T21:02:34Z

+}
+
+function createScenarioWaitForCondition(state: QaBusState) {
+  const sinceIndex = state.getSnapshot().messages.length;


Recompute scenario failure cursor after bus resets

createScenarioWaitForCondition snapshots sinceIndex once from state.getSnapshot().messages.length and reuses it for every poll, but many scenario flows call reset() after API creation (for example qa/scenarios/model-switch-tool-continuity.md). Because reset() clears messages, this stale pre-reset index can point past all newly added messages, so findFailureOutboundMessage(..., { cursorSpace: "all" }) skips fresh auth/runtime failure replies and the scenario falls through to a timeout instead of failing fast on the real error.

Useful? React with 👍 / 👎.

…shakkernerd

…shakkernerd)

…shakkernerd

…shakkernerd)

openclaw-barnacle Bot added extensions: qa-lab size: S maintainer Maintainer-authored PR labels Apr 8, 2026

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/auto-reply/reply/agent-runner-execution.ts Outdated

Comment thread extensions/qa-lab/src/suite.ts

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread extensions/qa-lab/src/suite.ts

openclaw-barnacle Bot added size: M and removed size: S labels Apr 8, 2026

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread extensions/qa-lab/src/suite.ts Outdated

Comment thread extensions/qa-lab/src/reply-failure.ts

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

shakkernerd self-assigned this Apr 8, 2026

shakkernerd added 11 commits April 8, 2026 21:47

fix: fail fast on qa live auth errors

c310530

fix: fail fast across qa scenario wait paths

3af8565

test: cover qa scenario wait failure replies

dfd32a7

fix: sanitize qa missing-key replies

700ddf2

test: cover sanitized qa missing-key replies

994cb32

fix: align qa wait cursor semantics

0833db5

test: cover mixed-traffic qa wait cursors

d507fbe

fix: classify curated qa missing-key replies

5c53242

test: cover curated qa missing-key reply classification

ef8ef15

fix: harden qa missing-key provider messages

9004772

test: cover unsafe qa missing-key providers

11fc6bf

shakkernerd force-pushed the fix/qa-live-auth-fail-fast branch from 5ce39b7 to 11fc6bf Compare April 8, 2026 20:50

docs(changelog): add qa auth fail-fast entry (#63333) (thanks @shakke…

00bace5

…rnerd)

shakkernerd merged commit 540fcd4 into main Apr 8, 2026
9 checks passed

shakkernerd deleted the fix/qa-live-auth-fail-fast branch April 8, 2026 20:55

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 8, 2026

📡 Upstream Digest — 2026-04-08 22:31 UTC curtismercier/openclaw-mods#514

Open

lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026

docs(changelog): add qa auth fail-fast entry (openclaw#63333) (thanks @…

fdc757f

…shakkernerd)

ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026

docs(changelog): add qa auth fail-fast entry (openclaw#63333) (thanks @…

5a3af67

…shakkernerd)

Uh oh!

Conversation

shakkernerd commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

Root Cause

User-visible Changes

Why This Matters

Verification

Targeted tests

Live repro before the fix

Live repro after the fix

Files

Risks and Mitigations

Uh oh!

aisle-research-bot Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shakkernerd commented Apr 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shakkernerd commented Apr 8, 2026 •

edited

Loading

aisle-research-bot Bot commented Apr 8, 2026 •

edited

Loading