openai-codex: classify runtime failures and make full access truthful#64439
Conversation
There was a problem hiding this comment.
Pull request overview
This PR advances OpenClaw’s GPT-5.4/Codex parity effort by (1) ensuring Codex OAuth authorization URLs always include required scopes and (2) improving runtime truthfulness by classifying provider/runtime failures and accurately advertising whether /elevated full (auto-approved host exec) is actually available in the current runtime.
Changes:
- Normalize OpenAI Codex OAuth authorize URLs to always include required scopes (including
model.requestandapi.responses.write). - Introduce
ProviderRuntimeFailureKindclassification and thread it through error observation + lifecycle logging; update user-facing error text for several failure categories. - Extend embedded elevated metadata with
fullAccessAvailable/fullAccessBlockedReasonand adjust prompts/exec hints to avoid suggesting/elevated fullwhen unavailable.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/plugins/provider-openai-codex-oauth.ts | Normalizes authorize URLs to ensure required Codex scopes are present. |
| src/commands/openai-codex-oauth.test.ts | Updates/adds tests validating scope normalization behavior. |
| src/auto-reply/reply/get-reply-run.ts | Computes/threads full-access availability into exec hints and embedded runner params. |
| src/auto-reply/reply/get-reply-run.exec-hint.test.ts | Adds coverage for the new “full unavailable” exec hint copy. |
| src/auto-reply/reply/commands-system-prompt.ts | Includes fullAccessAvailable in sandboxInfo passed to system prompt bundle. |
| src/agents/system-prompt.ts | Updates system prompt to advertise /elevated full only when available and explain blocked states. |
| src/agents/system-prompt.test.ts | Adds/updates prompt assertions for full-access availability/blocked messaging. |
| src/agents/pi-embedded-subscribe.handlers.lifecycle.ts | Threads provider runtime failure kind into lifecycle logging and observation fields. |
| src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts | Adds assertions that runtime failure kind is logged for relevant errors. |
| src/agents/pi-embedded-runner/types.ts | Adds EmbeddedFullAccessBlockedReason and new elevated metadata fields. |
| src/agents/pi-embedded-runner/sandbox-info.ts | Adds resolveEmbeddedFullAccessState and plumbs full-access metadata into sandbox info. |
| src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts | Verifies sandbox info carries full-access truth when provided. |
| src/agents/pi-embedded-helpers/errors.ts | Implements provider/runtime failure kind classification + more truthful user-facing error text. |
| src/agents/pi-embedded-helpers.ts | Re-exports provider/runtime failure kind utilities/types. |
| src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts | Adds test coverage for provider/runtime failure kind classification. |
| src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts | Adds test coverage for the new user-facing error messages. |
| src/agents/pi-embedded-error-observation.ts | Extends observation fields with provider runtime failure kind (provider-aware). |
| src/agents/pi-embedded-error-observation.test.ts | Updates expectations for new observation fields and codex scope-lane behavior. |
| src/agents/bash-tools.exec-types.ts | Extends elevated defaults to optionally carry full-access availability + blocked reason. |
Greptile SummaryThis PR combines auth/runtime failure classification (
Confidence Score: 4/5Two P1 bugs (classification override in lifecycle log, lost fullAccessBlockedReason in commands prompt) should be fixed before merging. The core classification machinery (errors.ts, sandbox-info.ts, OAuth normalization) is correct. The two P1 issues are both straightforward one-to-three-line fixes with clear root causes, so the overall structural approach of the PR is sound. src/agents/pi-embedded-subscribe.handlers.lifecycle.ts and src/auto-reply/reply/commands-system-prompt.ts
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc46c3641d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the current review comments on runtime truthfulness. This push:
Focused validation passed locally:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43a8c13fc3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the concrete The failing job was not generic CI noise: What changed:
This keeps the test fixture aligned with current runtime exports without widening PR scope beyond runtime truthfulness. |
|
Addressed the remaining branch-owned runtime-truthfulness gap on the latest head What changed:
Local validation on this head:
Scope remains runtime truthfulness only:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e36f6db1d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Current head is now This update fixes the remaining branch-owned Local validation on this head:
The remaining red GitHub signals appear to be shared/base CI noise rather than PR-owned regressions: the same Once the rerun settles, this should be ready for human maintainer review as the runtime-truthfulness slice. |
|
@steipete finished these now for your review Chat GPT major upgrade fixes (make GPT 5.4 = as good as Opus 4.6 in Openclaw). cut down to 4 PR's now. |
|
Addressed the remaining runtime-truthfulness review items on head This update:
Focused validation passed locally:
All currently open review threads on this PR have been resolved against the latest head. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab8151899b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21488fd344
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Current head Local validation on this head:
That focused rerun passed Scope is still runtime truthfulness only. |
191d7f2 to
4c12b0c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c12b0cec3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (code <= 0x20) { | ||
| continue; |
There was a problem hiding this comment.
Restrict skipped chars to whitespace in base64 normalization
canonicalizeBase64 now skips every character with codepoint <= 0x20, which accepts non-whitespace control bytes (for example \0 or \x1F) as if they were valid separators. That means malformed payloads like YWJj\0ZA== are silently normalized to YWJjZA== instead of being rejected, weakening the validator used on media input paths. Please only skip actual whitespace characters here (matching prior behavior) so control characters still fail validation.
Useful? React with 👍 / 👎.
693222e to
5792ee9
Compare
|
Landed. Thanks @100yenadmin. Follow-up fixes applied before merge:
Local verification:
Merged as |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5792ee9bee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (code <= 0x20) { | ||
| continue; |
There was a problem hiding this comment.
Reject control bytes during base64 canonicalization
Skipping every byte with codepoint <= 0x20 makes canonicalizeBase64 silently accept non-whitespace control characters (for example \0 or \x1f) as separators, so malformed payloads can be normalized into valid-looking base64 instead of being rejected. This weakens validation on media input paths that rely on this function; e.g. YWJj\0ZA== now canonicalizes to YWJjZA==.
Useful? React with 👍 / 👎.
…i + pruning-defaults) Fixes three inherited failures that are red on every recent main commit since PR B (openclaw#64439) and PR C (openclaw#64300) landed. Each failure is a stale test surface that a production change forgot to update. 1) target-resolver.test.ts is mocking plugins/runtime.js with only getActivePluginChannelRegistryVersion. After src/channels/registry.ts started calling getActivePluginChannelRegistry + getActivePluginRegistry through listRegisteredChannelPluginEntries(), every resolveMessagingTarget test that reaches normalizeTargetForProvider fails with 'No getActivePluginChannelRegistry export is defined on the ../../plugins/runtime.js mock'. Adds both exports to the mock factory with a registry seeded from the channels the test cases actually use (discord, imessage, mattermost, slack, telegram) so normalizeAnyChannelId resolves them the way real runtime would. 2) memory-wiki/index.test.ts expects 16 gateway methods, but src/gateway.ts now registers 19 after wiki.importRuns, wiki.importInsights, and wiki.palace were added between wiki.status and wiki.init. Extends the expected list to match the real registration order. 3) test/extension-test-boundary.test.ts flags src/config/config.pruning-defaults.test.ts as a core test that deep-imports extensions/anthropic/provider-policy-api.js, which violates the extension-test-boundary rule. Moves the test to extensions/anthropic/config-pruning-defaults.test.ts with inferred types from applyAnthropicConfigDefaults's parameter/return shape, matching the pattern used by the sibling provider-policy-api.test.ts. Local validation: - pnpm test src/infra/outbound/target-resolver.test.ts - pnpm test extensions/memory-wiki/index.test.ts - pnpm test extensions/anthropic/config-pruning-defaults.test.ts - pnpm test test/extension-test-boundary.test.ts Refs openclaw#64227
Summary
This is the compact runtime-truthfulness slice of the GPT-5.4 / Codex parity program tracked in #64227.
It combines the original Contract 1 auth/runtime truthfulness work from #64229 with the Contract 4 permission truthfulness work from #64231, so OpenClaw tells the truth about both provider/runtime failures and whether
/elevated fullis actually available.Scope
What changed
openidprofileemailoffline_accessmodel.requestapi.responses.writeauth_scopeauth_refreshauth_html_403proxydnstimeoutschemasandbox_blockedreplay_invalidunknownproviderRuntimeFailureKindthrough embedded-run observation fields and lifecycle loggingfullAccessAvailableandfullAccessBlockedReason/elevated fullonly when auto-approved host exec is actually available for the current runtimeValidation
checkstack completed while landing the combined branch commitspnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/auto-reply/reply/get-reply-run.exec-hint.test.tsNon-goals
bash-tools.exec