Skip to content

fix(codex): guard against stale codex app snapshots leading to plugin invocation failure#83807

Merged
kevinslin merged 14 commits into
openclaw:mainfrom
kevinslin:dev/kevinlin/plugins-list-enable-disable
May 19, 2026
Merged

fix(codex): guard against stale codex app snapshots leading to plugin invocation failure#83807
kevinslin merged 14 commits into
openclaw:mainfrom
kevinslin:dev/kevinlin/plugins-list-enable-disable

Conversation

@kevinslin

@kevinslin kevinslin commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • resolve live Codex plugin config at attempt time so public Codex app-server runs use the current codexPlugins settings
  • force and log app/list refreshes when plugin-owned apps are missing or not ready, and sanitize HTML challenge failures in inventory diagnostics
  • preserve fail-closed behavior when app inventory is unavailable while still refreshing stale cached inventory in the background

Verification

  • pnpm changed:lanes --json
  • pnpm check:changed
  • pnpm build
  • node scripts/run-vitest.mjs extensions/codex/index.test.ts extensions/codex/src/app-server/plugin-thread-config.test.ts extensions/codex/src/app-server/app-inventory-cache.test.ts extensions/codex/src/app-server/plugin-inventory.test.ts extensions/codex/src/app-server/plugin-activation.test.ts extensions/codex/src/app-server/thread-lifecycle.test.ts
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/app-inventory-cache.test.ts extensions/codex/src/app-server/plugin-thread-config.test.ts

Real behavior proof

Behavior addressed: Regular-profile Codex Google Calendar plugin calls failed because the Codex harness could miss live codexPlugins config and the app inventory recovery path was opaque. This patch makes the live config path authoritative, refreshes and logs app inventory recovery, and keeps the thread app binding fail-closed when inventory cannot verify readiness.
Real environment tested: Isolated OpenClaw gateway workspace regular-google-calendar-plugin-token3 using auth profile openai-codex:default, gateway ws://127.0.0.1:19992, and the managed Codex app-server Google Calendar plugin path.
Exact steps or command run after this patch: OPENCLAW_INTEG_GATEWAY_FORCE=1 ./.mem/integ/scripts/run_integ_gateway.mjs regular-google-calendar-plugin-token3 19992, then a live agent request in session claw-integ-regular-google-calendar-fixed-4 using the saved prompt in .mem/main/proofs/demo-11-regular-google-calendar-plugin/raw/live-calendar-profile.command.md, then UV_CACHE_DIR=/private/tmp/openclaw-uv-cache UV_TOOL_DIR=/private/tmp/openclaw-uv-tools uvx showboat verify .mem/main/proofs/demo-11-regular-google-calendar-plugin/raw/showboat-summary.md.
Evidence after fix: Terminal output captured in the Showboat summary artifact:

session_id=claw-integ-regular-google-calendar-fixed-4
run_id=claw-integ-c646acf5-590c-49cd-82a9-a639197c3a2f
auth_profile=openai-codex:default
google_calendar_app_ids=connector_947e0d954944416db111db556030eea6
final_marker=GOOGLE_CALENDAR_PLUGIN_OK

Observed result after fix: The live gateway run refreshed inventory, bound the Google Calendar connector app into plugin app policy context, and returned GOOGLE_CALENDAR_PLUGIN_OK from the Google Calendar app/plugin tool path.
What was not tested: No event mutation flow was exercised; this proof covers read/query availability only. The broad pnpm test:changed lane previously showed unrelated local failures around approval-policy defaults and approvalsReviewer expectations.

@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch updates the Codex plugin harness and app-server inventory path to resolve live codexPlugins config, force-refresh missing or not-ready plugin app inventory, and add redacted diagnostics with focused tests.

Reproducibility: yes. at source/proof level, but I did not run a fresh local or live repro in this read-only review. Current main wires static api.pluginConfig into the public harness while the PR body shows an after-fix live Google Calendar run reaching GOOGLE_CALENDAR_PLUGIN_OK.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: The PR has strong live proof and a focused implementation with no blocking findings, while the auth/app-policy boundary still needs ordinary owner review.

Rank-up moves:

  • Add one redacted destructive allow/deny live proof if maintainers want stronger security-boundary confidence before merge.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🥚 common Sunspot Signal Puff

       /\  .---.  /\         
      /  \/     \/  \        
     /   ( -   - )   \       
    |       ._.       |      
    |   /|  ===  |\   |      
     \  \|______/|/  /       
      '._  `--'  _.'         
         '-.__.-'            
       _/|_|  |_|\_          
      /__|      |__\         
       .-----------.         
      '-------------'        

Rarity: 🥚 common.
Trait: collects tiny proofs.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Sunspot Signal Puff in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • How to hatch it: reach status: 👀 ready for maintainer look or status: 🚀 automerge armed; that usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal/live-output proof from an isolated gateway run showing a Google Calendar connector app id and GOOGLE_CALENDAR_PLUGIN_OK; future proof should keep private IDs, tokens, phone numbers, and endpoints redacted.

Risk before merge
Why this matters: - This PR changes which auth-profile-scoped Codex app inventory is authoritative for plugin app exposure, so a Codex app-policy/auth owner should explicitly accept the account-scoped boundary before merge.

  • The live proof covers Google Calendar read/query availability but not a destructive allow/deny plugin path, even though the changed code affects app-level destructive policy context.

Maintainer options:

  1. Accept scoped live proof after Codex owner review (recommended)
    A Codex app-policy/auth owner can merge after confirming the read-path live proof plus focused tests are enough for the intended account-scoped inventory recovery behavior.
  2. Ask for destructive allow/deny proof
    Before merge, request one redacted live run showing a plugin destructive action is allowed or denied according to codexPlugins policy for the same auth profile.
  3. Pause if the app inventory contract is unstable
    If maintainers are not ready to treat refreshed app/list inventory as authoritative for plugin app exposure, pause this PR until that Codex contract is settled.

Next step before merge
The protected maintainer label and auth/security-boundary merge risks require Codex owner review rather than an automated repair.

Security
Cleared: No concrete security or supply-chain defect was found in the diff; the remaining security-boundary concern is maintainer acceptance of the Codex app-policy behavior rather than a discrete patch bug.

Review details

Best possible solution:

Land the patch after Codex app-policy/auth owner review accepts the account-scoped inventory recovery behavior; request one destructive allow/deny proof only if maintainers need stronger security-boundary confidence.

Do we have a high-confidence way to reproduce the issue?

Yes at source/proof level, but I did not run a fresh local or live repro in this read-only review. Current main wires static api.pluginConfig into the public harness while the PR body shows an after-fix live Google Calendar run reaching GOOGLE_CALENDAR_PLUGIN_OK.

Is this the best way to solve the issue?

Yes, with owner acceptance: resolving live config at attempt time and forcing target app inventory refresh during thread-config build is a narrow fix for the stale snapshot path. The patch keeps missing inventory fail-closed, which is the safer default for plugin app exposure.

Label justifications:

  • P1: The PR targets a broken live Codex plugin workflow where configured Google Calendar plugin calls can fail for regular-profile users.
  • merge-risk: 🚨 auth-provider: The diff changes how auth-profile-scoped Codex app inventory and live codexPlugins config determine which plugin apps are exposed.
  • merge-risk: 🚨 security-boundary: The diff affects app-level plugin exposure and destructive-action policy context, so owner review should confirm the fail-closed boundary remains correct.

What I checked:

  • Current main uses stale harness config: Current main computes resolveCurrentPluginConfig, but the public Codex harness is registered with static api.pluginConfig; the harness then forwards that static value into attempts, side questions, and compaction. (extensions/codex/index.ts:34, 6f18decb7a2c)
  • PR resolves live config at attempt time: The PR head adds resolvePluginConfig and calls it for runAttempt, runSideQuestion, and compact, while index.ts passes the live config resolver into the public harness. (extensions/codex/harness.ts:17, 9c7666eaa6ff)
  • PR preserves fail-closed app exposure while refreshing stale inventory: The PR reads app inventory without scheduling duplicate refresh on the first pass, force-refreshes for missing/not-ready plugin apps, and only emits thread apps when inventory is not missing and the app is accessible and enabled. (extensions/codex/src/app-server/plugin-thread-config.ts:105, 9c7666eaa6ff)
  • Diagnostics avoid raw HTML challenge output: The PR sanitizes app inventory refresh diagnostics, omits HTML response bodies, redacts sensitive query/data keys, and logs only a cache-key fingerprint on app-list failures. (extensions/codex/src/app-server/app-inventory-cache.ts:184, 9c7666eaa6ff)
  • Documented contract requires target-session app inventory readiness: The Codex native plugin docs say runtime app inventory is the target-session accessibility check and that only enabled, accessible plugin apps should be injected into the restrictive thread app config. Public docs: docs/plugins/codex-native-plugins.md. (docs/plugins/codex-native-plugins.md:108, 6f18decb7a2c)
  • Real behavior proof in PR body: The PR body reports a live isolated gateway run for Google Calendar using openai-codex:default that refreshed inventory, bound a connector app id, and returned GOOGLE_CALENDAR_PLUGIN_OK; destructive mutation flow was explicitly not tested. (9c7666eaa6ff)

Likely related people:

  • kevinslin: Authored the merged native Codex plugin app support and follow-up plugin read-tool approval work that introduced and refined the central inventory/thread-config path. (role: feature owner and recent area contributor; confidence: high; commits: a1ac559ed7e6, cfc189de0adb; files: extensions/codex/src/app-server/plugin-thread-config.ts, extensions/codex/src/app-server/app-inventory-cache.ts, extensions/codex/src/app-server/plugin-inventory.ts)
  • steipete: Introduced core Codex app-server controls/protocol bridge and recently touched adjacent Codex harness/index behavior, making this a likely routing owner for app-server boundary review. (role: original app-server harness contributor and recent adjacent owner; confidence: medium; commits: 31a0b7bd42a5, 69566e43cb8b, 827b0de0ce74; files: extensions/codex/index.ts, extensions/codex/harness.ts, extensions/codex/src/app-server/run-attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6f18decb7a2c.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7edc32f21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +435 to +437
accessible: true,
enabled: true,
needsAuth: app.needsAuth,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not enable auth-required apps without inventory

When app/list is unavailable, this fallback fabricates accessible: true and enabled: true for every plugin/read app, including apps whose detail has needsAuth: true. The build loop below only rejects !app.accessible || !app.enabled, so an already-installed plugin with an auth-required app will get an enabled Codex app config solely because the inventory refresh failed; previously this path failed closed until app readiness could be verified.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 18, 2026
@kevinslin kevinslin force-pushed the dev/kevinlin/plugins-list-enable-disable branch from d7edc32 to 81c9863 Compare May 18, 2026 23:40
@kevinslin kevinslin requested review from a team as code owners May 18, 2026 23:40
@github-actions github-actions Bot added the dependencies-changed PR changes dependency-related files label May 18, 2026
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: discord Channel integration: discord channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: signal Channel integration: signal channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser labels May 18, 2026
return false;
}
return (
coerceLegacyOAuthEncryptedPayload(raw.encrypted) !== null ||
@socket-security

socket-security Bot commented May 18, 2026

Copy link
Copy Markdown

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

Comment thread extensions/codex/src/app-server/run-attempt.ts Outdated
Comment thread extensions/codex/src/app-server/run-attempt.ts Outdated
Comment thread extensions/codex/src/app-server/thread-lifecycle.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants