fix(agents): sanitize raw HTTP 401 provider errors in user-visible replies (#56197)#77394
Conversation
|
Codex review: needs changes before merge. Reviewed May 31, 2026, 1:25 AM ET / 05:25 UTC. Summary PR surface: Source +62, Tests +102, Docs +1. Total +165 across 3 files. Reproducibility: yes. from source inspection: current main still lacks a plain HTTP 401 invalid-token formatter branch, so those messages can fall through to raw HTTP/API payload formatting. I did not run the helper in this read-only review. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Port the sanitizer and focused tests to the current Do we have a high-confidence way to reproduce the issue? Yes, from source inspection: current main still lacks a plain HTTP 401 invalid-token formatter branch, so those messages can fall through to raw HTTP/API payload formatting. I did not run the helper in this read-only review. Is this the best way to solve the issue? Yes for the narrow copy-only scope: sanitizing the embedded-run fallback is the safest interim fix while refresh/retry stays provider-owned. The submitted branch is not the best merge shape until it is ported to the current helper files and the changelog edit is removed. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 51dee73a5d3e. Label changesLabel justifications:
Evidence reviewedPR surface: Source +62, Tests +102, Docs +1. Total +165 across 3 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
martingarramon
left a comment
There was a problem hiding this comment.
The new branch at src/agents/pi-embedded-helpers/errors.ts:976 keys on failoverClassification.reason === "auth", but that reason is set by two upstream paths that are NOT actual HTTP 401 responses:
-
HTTP 403 fall-through — at
errors.ts:666, bothstatus === 401 || status === 403return theauthreason after billing/permanent-auth precedence. A real 403 (e.g., key revoked, scope-missing) with neither HTML body norpermission_errorJSON would classify asauthand display copy claiming HTTP 401 when the provider actually returned 403. -
Message-only
invalid_api_key— aterrors.ts:849,isAuthErrorMessage(raw)returns the sameauthreason for messages without any HTTP status prefix. Same false-positive shape: copy claims HTTP 401 when no HTTP status was present.
Concrete fix: since classifyProviderRuntimeFailureKind already receives status (extracted via extractLeadingHttpStatus(raw)?.code in the formatAssistantErrorText caller), gate the new branch on it explicitly:
if (
failoverClassification?.kind === "reason" &&
failoverClassification.reason === "auth" &&
status === 401
) {
return "auth_invalid_token";
}This excludes the 403 fall-through (different status) and the message-only path (status is undefined). For those cases the existing fallback copy applies — which is what users saw before this change for those shapes, so no behavior regression.
Test gap: the PR's existing negative test covers 401 billing / permanent-auth, but not:
- Plain 403 falling to generic
auth(e.g."403 Forbidden"withoutpermission_errorJSON or HTML body) - Message-only
invalid_api_key(e.g.'{"error":{"code":"invalid_api_key"}}'with no HTTP status prefix)
Both should assert the output does NOT contain "HTTP 401". Adding these alongside the existing positive tests will prevent accidental re-broadening of the gate.
The PR's direction is correct; the user-visible regression is real. The gate-on-status fix above is the narrowest path to ship this without expanding scope into the token-refresh question.
…(review feedback from openclaw#77394)
|
I updated this, can you check it? |
martingarramon
left a comment
There was a problem hiding this comment.
LGTM. The new gate requires explicit 401 evidence, so the plain 403 fall-through and message-only invalid_api_key cases no longer produce the HTTP 401 invalid-token copy. The added negative tests cover both regressions from the review, and the ambiguous message-text path avoids over-claiming a specific HTTP status. Passing this back to maintainers.
martingarramon
left a comment
There was a problem hiding this comment.
Both concerns from the earlier round are addressed.
The 401-evidence gate (status === 401 || (status === undefined && messageMentions401 && !messageMentions403)) correctly excludes plain-403 fall-throughs and message-only payloads — the two shapes I flagged. Conservative in the right direction: a rare message embedding both 401 and 403 falls through to the generic auth path rather than claiming auth_invalid_token. The two negative tests pin this directly; the 'status code: 401, message: "expired token"' variant exercises the status === undefined leg of the gate.
LGTM.
da2b17a to
08dcfcb
Compare
…(review feedback from openclaw#77394)
|
Rebased onto current No code changes since @martingarramon's LGTM: same two commits ( Card should be back to green and mergeable now. |
martingarramon
left a comment
There was a problem hiding this comment.
LGTM holds after the May 10 rebase. The fix shape is unchanged (2590d63 + 08dcfcbc), the 403 anti-regression test is in, and the billing/auth_permanent precedence gate is pinned.
Remaining CI: Real behavior proof failure still needs your rerun/proof submission. checks-fast-contracts-plugins failure looks rerun/rebase-sensitive given the main-churn failure shape at the time of that push — current main is clean.
08dcfcb to
1a5dffd
Compare
…(review feedback from openclaw#77394)
|
Rebased onto current For RBP I dropped a 5-case direct-invocation capture of |
|
RBP block added. The five cases cover the review control points: raw 401 evidence, plain 403 fall-through without a 401 claim, message-only path, billing precedence, and the LGTM still holds. |
|
This pull request has been automatically marked as stale due to inactivity. |
…(review feedback from openclaw#77394)
1a5dffd to
81b255e
Compare
|
Land-ready after maintainer fixup. What changed since the original branch:
Local proof on head
GitHub proof:
Known gap: no live Feishu or Z.AI/AutoGLM expired-token session was run; token refresh/retry remains separate provider-owned follow-up work. |
Summary
Fixes the embedded-agent user-facing error path for plain provider HTTP 401 credential failures. The reported Feishu/Z.AI case returned
HTTP 401: "Invalid token"directly to chat users when an upstream JWT expired mid-session.This PR keeps token refresh/retry out of scope. It only maps credential-shaped HTTP 401 auth failures to safe re-authentication copy while preserving existing handling for replay-invalid session errors, schema errors, billing/key-limit errors, OpenAI scope errors, and plain 403s.
Fix
auth_invalid_tokenundersrc/agents/embedded-agent-helpers/errors.ts.Invalid token,Unauthorized,Incorrect API key, and structured 401 permission payloads whose message isInvalid token.CHANGELOG.md.Real behavior proof
Behavior addressed: Raw provider credential failures such as
HTTP 401: "Invalid token"no longer become chat-visible reply text fromformatAssistantErrorText.Real environment tested: Local OpenClaw source checkout on macOS, using the current embedded-agent helper module directly through the workspace TypeScript runtime after rebasing onto
origin/main.Exact steps or command run after this patch:
node_modules/.bin/tsx -e '<direct import of ./src/agents/embedded-agent-helpers/errors.ts and calls to formatAssistantErrorText for raw 401/403/auth payloads>'Evidence after fix: Direct runtime invocation output from the patched helper:
Observed result after fix: Credential-shaped 401 cases return the friendly re-authentication copy. Negative controls keep their prior lanes: plain 403 does not claim HTTP 401, message-only auth without HTTP status does not claim HTTP 401, and provider-less missing-scope 401 payloads remain out of the invalid-token lane.
What was not tested: No live Feishu or Z.AI/AutoGLM expired-token session was run. Automatic token refresh/retry remains a separate provider-owned follow-up.
Verification
pnpm exec oxfmt --check --threads=1 src/agents/embedded-agent-helpers/errors.ts src/agents/embedded-agent-helpers.formatassistanterrortext.test.ts src/agents/embedded-agent-error-observation.test.tsnode scripts/run-vitest.mjs src/agents/embedded-agent-helpers.formatassistanterrortext.test.ts src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts src/agents/embedded-agent-error-observation.test.ts-> 219 helper tests and 16 observation tests passed..agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main --parallel-tests "node scripts/run-vitest.mjs src/agents/embedded-agent-helpers.formatassistanterrortext.test.ts src/agents/embedded-agent-helpers.isbillingerrormessage.test.ts src/agents/embedded-agent-error-observation.test.ts"-> clean, no accepted/actionable findings.Fixes #56197.