fix: fail fast on qa live auth errors#63333
Conversation
🔒 Aisle Security AnalysisWe found 2 potential security issue(s) in this PR:
1. 🟡 Outbound assistant text propagated into thrown Errors and QA reports (potential secret leakage)
Description
This error is then:
If an outbound assistant message contains sensitive data (e.g., tokens accidentally echoed by the model, tool outputs, internal URLs, credentials included in scenario prompts, etc.), that content will be persisted into CI artifacts/logs. Vulnerable code: if (failureMessage) {
throw new Error(extractQaFailureReplyText(failureMessage.text) ?? failureMessage.text);
}RecommendationAvoid embedding full assistant text into exception messages that are later logged or persisted. Options:
class QaFailureReplyError extends Error {
constructor(public readonly failureKind: string) {
super(`QA failure reply detected: ${failureKind}`);
this.name = "QaFailureReplyError";
}
}
// ...
const failureText = extractQaFailureReplyText(failureMessage.text);
if (failureText) {
// keep full text only in memory if needed, but do not log/persist by default
throw new QaFailureReplyError(failureText.slice(0, 80));
}
2. 🟡 User-facing replies disclose gateway auth/config state via missing-API-key errors
Description
Vulnerable code: const SAFE_MISSING_API_KEY_PROVIDERS = new Set(["anthropic", "google", "openai", "openai-codex"]);
function buildMissingApiKeyFailureText(message: string): string | null {
const providerMatch = normalizedMessage.match(/No API key found for provider "([^"]+)"/u);
const provider = providerMatch?.[1]?.trim().toLowerCase();
...
if (provider === "openai" && normalizedMessage.includes("OpenAI Codex OAuth")) {
return "⚠️ Missing API key for OpenAI on the gateway. Use `openai-codex/gpt-5.4` for OAuth, or set `OPENAI_API_KEY`, then try again.";
}
if (SAFE_MISSING_API_KEY_PROVIDERS.has(provider)) {
return `⚠️ Missing API key for provider "${provider}". Configure the gateway auth for that provider, then try again.`;
}
}RecommendationOnly surface detailed provider/auth configuration guidance in admin/control UI contexts.
Example: function buildMissingApiKeyFailureText(message: string, opts: { admin: boolean }): string | null {
const provider = ...;
if (!provider) return null;
if (!opts.admin) {
return "⚠️ Model configuration error. Please contact an administrator.";
}
return `⚠️ Gateway is missing credentials for provider "${provider}". Configure gateway auth and retry.`;
}Analyzed PR: #63333 at commit Last updated on: 2026-04-08T21:02:26Z |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 342dde197f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryThis PR adds fail-fast behavior to live QA scenario flows when the gateway returns a known auth/runtime error reply, and ensures that "No API key found for provider" errors preserve their direct user guidance instead of collapsing to the generic fallback message. The changes are narrow and well-targeted: a new Confidence Score: 5/5Safe to merge — logic is correct, tests cover the key behaviors, and all remaining observations are P2. No P0 or P1 issues. The classifier, deduplication helper, and buildExternalRunFailureText override are all correct. The only note is a design limitation (predicate-gated fail-fast) that is documented in the PR description and does not regress any existing behavior. No files require special attention.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 97732e7469
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a719f9231
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| function createScenarioWaitForCondition(state: QaBusState) { | ||
| const sinceIndex = state.getSnapshot().messages.length; |
There was a problem hiding this comment.
Recompute failure cursor after scenario bus resets
createScenarioWaitForCondition snapshots sinceIndex once from the pre-flow message length and reuses it for every poll. Several flows invoke reset() after API creation (for example qa/scenarios/model-switch-tool-continuity.md), which clears state.messages; with the stale pre-reset index, findFailureOutboundMessage slices past new outbound failures and the scenario can fall through to a timeout instead of failing fast on the auth/error reply.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5ce39b7ddf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| function createScenarioWaitForCondition(state: QaBusState) { | ||
| const sinceIndex = state.getSnapshot().messages.length; |
There was a problem hiding this comment.
Recompute failure cursor after bus resets
createScenarioWaitForCondition snapshots sinceIndex once from state.getSnapshot().messages.length, but scenario flows frequently call reset() after API creation; reset() clears messages, so this stale pre-reset index points past newly-added outbound errors. In that case findFailureOutboundMessage(..., { cursorSpace: "all" }) slices away fresh failure replies and the wait falls through to a timeout instead of failing fast on the real auth/error response.
Useful? React with 👍 / 👎.
5ce39b7 to
11fc6bf
Compare
|
Landed via rebase.
Thanks @shakkernerd! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00bace5815
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| function createScenarioWaitForCondition(state: QaBusState) { | ||
| const sinceIndex = state.getSnapshot().messages.length; |
There was a problem hiding this comment.
Recompute scenario failure cursor after bus resets
createScenarioWaitForCondition snapshots sinceIndex once from state.getSnapshot().messages.length and reuses it for every poll, but many scenario flows call reset() after API creation (for example qa/scenarios/model-switch-tool-continuity.md). Because reset() clears messages, this stale pre-reset index can point past all newly added messages, so findFailureOutboundMessage(..., { cursorSpace: "all" }) skips fresh auth/runtime failure replies and the scenario falls through to a timeout instead of failing fast on the real error.
Useful? React with 👍 / 👎.
Summary
Change Type
Root Cause
User-visible Changes
waitForCondition(...)checks.Something went wrong while processing your request.x | xis now collapsed into one clean message in the final QA report.Why This Matters
This bug was first captured clearly through Kova.
Kova showed a sharp mismatch between two otherwise similar live runs:
openai/gpt-5.4failed after a long timeoutopenai-codex/gpt-5.4passed quickly on the same machineExample Kova evidence from the investigation:
That made the bug obvious:
Verification
Targeted tests
pnpm test extensions/qa-lab/src/reply-failure.test.ts extensions/qa-lab/src/suite.test.ts src/auto-reply/reply/agent-runner-execution.test.tsLive repro before the fix
pnpm openclaw qa manual --provider-mode live-frontier --model openai/gpt-5.4 --message 'Reply exactly: OPENAI_OK' --timeout-ms 30000Live repro after the fix
pnpm openclaw qa suite --provider-mode live-frontier --model openai/gpt-5.4 --alt-model openai/gpt-5.4 --scenario memory-recall --output-dir .artifacts/qa-e2e/repro-auth-timeout-memory-recallResult:
6.8sagent.wait returned error: FailoverError: No API key found for provider "openai". You are authenticated with OpenAI Codex OAuth. Use openai-codex/gpt-5.4 (OAuth) or set OPENAI_API_KEY to use openai/gpt-5.4.Snippet from the generated QA report:
Files
extensions/qa-lab/src/reply-failure.tsextensions/qa-lab/src/reply-failure.test.tsextensions/qa-lab/src/suite.tsextensions/qa-lab/src/suite.test.tssrc/auto-reply/reply/agent-runner-execution.tssrc/auto-reply/reply/agent-runner-execution.test.tsRisks and Mitigations