fix: route plugin LLM completions through Codex runtime by jalehman · Pull Request #81511 · openclaw/openclaw

jalehman · 2026-05-13T19:15:21Z

Summary

Problem: plugin LLM completions treated canonical openai/gpt-* model refs as direct OpenAI API calls, even when config mapped those refs to the Codex runtime.
Why it matters: context-engine plugins such as lossless-claw could fail LLM summarization under Codex OAuth with a missing OPENAI_API_KEY, then fall back to deterministic truncation.
What changed: completion runtime selection now preserves the logical model ref while routing execution through the configured runtime provider when the resolved agent runtime is Codex.
What did NOT change: this does not grant plugins new model access or bypass model policy; it only makes plugin LLM completion honor the already-resolved runtime policy.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior addressed: lossless-claw doctor repair needed to use model-backed plugin LLM summarization while the active OpenClaw gateway was authenticated with OpenAI Codex OAuth, not a direct OPENAI_API_KEY.

Real environment tested: Josh's live OpenClaw gateway running this PR head, OpenClaw 2026.5.12-beta.1 (47697a7), Telegram group topic session agent:main:telegram:group:-1003774691294:topic:3731, model openai/gpt-5.5, runtime OpenAI Codex, auth openai-codex:josh@martian.engineering.

Exact steps or command run after this patch: deployed this PR branch locally, rebuilt and restarted the gateway, then ran /lossless doctor apply in the same Telegram/OpenClaw session that had a known fallback LCM summary repair candidate. Verified after the run with lcm_describe sum_22ef5580363377a1 and lcm_grep "LCM fallback summary".

Evidence after fix: before doctor apply, lcm_grep "LCM fallback summary" found recent repair candidate sum_22ef5580363377a1 in this session. After doctor apply, lcm_describe sum_22ef5580363377a1 returned a real summary instead of the fallback marker:

LCM_SUMMARY sum_22ef5580363377a1
meta conv=4754 kind=leaf depth=0 tok=656 descTok=0 srcTok=39920 desc=0 range=2026-05-12 08:14 PDT..2026-05-12 08:42 PDT budgetCap=4000
content
Codex GH auth is now working via symlinked isolated HOME/Keychains: `gh api user` returned `jalehman`, and `gh repo view openclaw/openclaw` returned `openclaw/openclaw MAINTAIN`. Best fix chosen was "make it work" by symlinking Codex HOME `.config/gh` and `Library/Keychains`, not host HOME isolation or launchd `GH_TOKEN`.

A separate issue appeared: `tools.toolSearch` was supposed to be disabled, but the gateway kept serving stale in-memory config...

The live session status for the proof run showed:

OpenClaw 2026.5.12-beta.1 (47697a7)
Model: openai/gpt-5.5
Auth: oauth (openai-codex:josh@martian.engineering)
Runtime: OpenAI Codex
Session: agent:main:telegram:group:-1003774691294:topic:3731

Observed result after fix: doctor apply repaired the fallback summary in place with model-generated summary content and did not hit the previous direct OpenAI auth failure path. The repaired summary no longer contains [LCM fallback summary], and a follow-up grep for LCM fallback summary no longer returned sum_22ef5580363377a1.

What was not tested: no direct OpenAI OPENAI_API_KEY configuration was tested, and no non-Codex agent runtime was changed by this live proof.

Before evidence (optional but encouraged): the pre-fix failure mode from the same environment was lossless summarizer auth failure for direct OpenAI provider traffic: Auth lookup failed for provider "openai": No API key found for provider "openai". You are authenticated with OpenAI Codex OAuth; OpenAI agent model runs use openai/gpt-* through the Codex runtime.

Root Cause (if applicable)

Root cause: plugin LLM completion resolved openai/gpt-* as direct provider traffic even when the configured model policy mapped that OpenAI model ref to the Codex agent runtime.
Missing detection / guardrail: tests covered direct provider completion preparation but not Codex-runtime-backed OpenAI refs through the plugin runtime.llm.complete path.
Contributing context (if known): agent turns already worked with Codex OAuth, so the failure only surfaced inside plugin/context-engine summarization.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/agents/simple-completion-runtime.selection.test.ts
- src/agents/simple-completion-runtime.test.ts
- src/plugins/runtime/runtime-llm.runtime.test.ts
Scenario the test should lock in: canonical openai/gpt-* refs preserve their logical identity while completion execution routes through Codex when resolved agent runtime policy says codex.
Why this is the smallest reliable guardrail: it verifies runtime-provider selection at the completion seam without needing live OAuth credentials in CI.
Existing test that already covers this (if any): new/updated tests in this PR cover the selection and preparation behavior.

User-visible / Behavior Changes

Plugin/context-engine LLM summarization now works with canonical OpenAI model refs that are configured to run through Codex OAuth. Users should no longer need to add OPENAI_API_KEY just to make plugin summarization work in a Codex-runtime OpenAI setup.

Diagram (if applicable)

Before:
lossless-claw -> runtime.llm.complete(openai/gpt-5.4-mini) -> direct OpenAI auth -> missing OPENAI_API_KEY -> fallback truncation

After:
lossless-claw -> runtime.llm.complete(openai/gpt-5.4-mini) -> resolved Codex runtime provider -> Codex OAuth -> model-backed summary repair

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS
Runtime/container: local OpenClaw gateway
Model/provider: openai/gpt-5.5 and openai/gpt-5.4-mini routed through Codex runtime/OAuth
Integration/channel (if any): Telegram group topic
Relevant config (redacted): OpenAI model refs configured with Codex agent runtime; lossless-claw summary model uses the OpenAI/Codex-backed runtime path.

Steps

Deploy this PR head and restart the gateway.
Run /lossless doctor apply in a session containing an LCM fallback summary candidate.
Inspect the repaired summary with lcm_describe <summary-id>.
Search for remaining fallback markers with lcm_grep "LCM fallback summary".

Expected

Doctor apply invokes the model-backed plugin LLM summarizer through Codex runtime/OAuth.
The fallback summary is replaced with a real summary.
No direct OpenAI OPENAI_API_KEY auth failure appears.

Actual

sum_22ef5580363377a1 was repaired in place and now contains coherent summary text instead of [LCM fallback summary].
The active gateway/session was on PR head 47697a7 with OpenAI Codex OAuth.
No direct OpenAI API-key auth failure was observed during the repair.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: live lossless doctor summary repair under Codex OAuth on this PR head; targeted unit/seam tests; changed-test and formatting/lint checks.
Edge cases checked: OpenRouter model id preservation, runtime-provider selection, registry config compatibility guard.
What you did not verify: direct OpenAI API-key mode and non-Codex runtime behavior in a live gateway.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: runtime-provider selection could accidentally change the provider used for non-Codex completion calls.
- Mitigation: keep the logical model ref separate from the execution provider and cover OpenRouter/direct-provider preservation in tests.

Testing

pnpm test src/agents/simple-completion-runtime.selection.test.ts -- --reporter=verbose
- Passed: 7 tests
pnpm test src/agents/simple-completion-runtime.test.ts -- --reporter=verbose
- Passed: 16 tests
pnpm test src/plugins/runtime/runtime-llm.runtime.test.ts -- --reporter=verbose
- Passed: 17 tests
pnpm test:changed
- Passed: 2 Vitest shards
pnpm format:check -- CHANGELOG.md src/agents/simple-completion-runtime.ts src/agents/simple-completion-runtime.test.ts src/agents/simple-completion-runtime.selection.test.ts
- Passed
node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/simple-completion-runtime.ts src/agents/simple-completion-runtime.test.ts src/agents/simple-completion-runtime.selection.test.ts
- Passed: 0 warnings, 0 errors
git diff --check
- Passed

Known gap: pnpm check:changed currently fails in untouched src/plugins/registry.runtime-config.test.ts type assertions, unrelated to this diff.

clawsweeper · 2026-05-13T19:18:38Z

Codex review: needs maintainer review before merge.

Summary
The PR routes simple plugin LLM completions through a Codex runtime provider when existing agent runtime policy maps canonical OpenAI model refs to Codex, with focused tests, a changelog entry, and a config-boundary guard update.

Reproducibility: yes. by source inspection: current main’s plugin runtime.llm.complete path delegates to simple-completion preparation, which currently uses the logical provider even though OpenAI agent refs can resolve to the Codex runtime. I did not run a live authenticated current-main failure in this read-only review.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix live gateway output showing a Codex-OAuth /lossless doctor apply run replacing an LCM fallback summary with model-generated summary content on the PR head.

Next step before merge
No ClawSweeper repair job is needed; this protected maintainer-labeled PR has no blocking findings and should proceed through normal review, CI, and merge gates.

Security
Cleared: The diff changes provider selection within existing runtime/model policy and does not add dependencies, workflows, package resolution, secret storage, or new command execution surfaces.

Review details

Best possible solution:

Land the narrow runtime-provider seam after normal maintainer review and CI, keeping logical model refs stable for policy/audit while aligning plugin LLM execution with existing Codex runtime auth.

Do we have a high-confidence way to reproduce the issue?

Yes by source inspection: current main’s plugin runtime.llm.complete path delegates to simple-completion preparation, which currently uses the logical provider even though OpenAI agent refs can resolve to the Codex runtime. I did not run a live authenticated current-main failure in this read-only review.

Is this the best way to solve the issue?

Yes; the PR reuses the existing harness-policy decision and separates logical provider from execution provider, which is narrower than changing plugin auth policy or rewriting public model refs.

What I checked:

Protected PR metadata: The provided live GitHub context lists the protected maintainer label, so this cleanup workflow should keep the PR open for maintainer handling rather than close it. (47697a757fea)
Current-main root cause: Current main prepares plugin simple-completion models with provider: selection.provider, so canonical openai/gpt-* selections prepare auth/transport against openai instead of the configured Codex runtime provider. (src/agents/simple-completion-runtime.ts:267, 256377c029f6)
Plugin LLM call path: runtime.llm.complete delegates to prepareSimpleCompletionModelForAgent, making simple-completion provider selection the path that controls plugin LLM auth and dispatch. (src/plugins/runtime/runtime-llm.runtime.ts:415, 256377c029f6)
Runtime policy contract: The documented model/runtime contract says openai/gpt-* agent refs run through Codex by default on the official OpenAI provider, with auth coming from Codex or openai-codex profiles. Public docs: docs/concepts/models.md. (docs/concepts/models.md:26, 256377c029f6)
Codex runtime selection source: resolveAgentHarnessPolicy already resolves official OpenAI refs to runtime: "codex" when appropriate; the PR reuses that existing policy instead of adding plugin-specific model policy. (src/agents/harness/policy.ts:17, 256377c029f6)
PR implementation: The PR adds runtimeProvider to the selection and switches preparation to selection.runtimeProvider ?? selection.provider, preserving logical audit/model identity while changing execution provider when policy resolves to Codex. (src/agents/simple-completion-runtime.ts:294, 47697a757fea)

Likely related people:

@steipete: Local path history for the simple-completion/runtime files is dominated by Peter Steinberger commits, making this a likely routing candidate for the current runtime surface. (role: recent area contributor; confidence: medium; commits: eeef486449, 4862d34925, 943cb47274; files: src/agents/simple-completion-runtime.ts, src/plugins/runtime/runtime-llm.runtime.ts)
@jalehman: Merged context-engine/plugin-runtime work introduced and stabilized the surrounding context-engine path that exposes plugin LLM completion; this PR’s proof and affected behavior run through that area. (role: context-engine and plugin-runtime feature contributor; confidence: medium; commits: fee91fefce, 4bfa800cc7, a327b6750d; files: src/context-engine/index.ts, src/context-engine/registry.ts, src/plugins/runtime/index.ts)
@pashpashpash: The provided prior ClawSweeper review ties the OpenAI-to-Codex automatic runtime routing policy used by this PR to recent work on the harness policy and OpenAI Codex routing files. (role: runtime-routing feature contributor; confidence: medium; commits: 1c3399010815, 02fe0d8978db; files: src/agents/harness/policy.ts, src/agents/openai-codex-routing.ts)

Remaining risk / open question:

I did not rerun the focused tests in this read-only review; the verdict relies on source inspection, the PR diff, and the contributor’s posted live/test output.
The live proof covers Codex OAuth and does not exercise a direct OpenAI API-key setup or non-Codex runtime in a real gateway.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 256377c029f6.

* fix: route plugin LLM completions through Codex runtime * fix: preserve OpenRouter completion model ids * fix: allow registry config compat guards (cherry picked from commit aac216d)

* fix: route plugin LLM completions through Codex runtime * fix: preserve OpenRouter completion model ids * fix: allow registry config compat guards

openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 13, 2026

openclaw-barnacle Bot added gateway Gateway runtime scripts Repository scripts labels May 14, 2026

jalehman added 3 commits May 13, 2026 17:38

fix: route plugin LLM completions through Codex runtime

2727cdc

fix: preserve OpenRouter completion model ids

9280b31

fix: allow registry config compat guards

47697a7

jalehman force-pushed the josh/fix-lossless-codex-llm branch from 6fd7332 to 47697a7 Compare May 14, 2026 00:39

openclaw-barnacle Bot removed the gateway Gateway runtime label May 14, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026

jalehman merged commit aac216d into main May 14, 2026
118 of 124 checks passed

jalehman deleted the josh/fix-lossless-codex-llm branch May 14, 2026 04:02

ValantisV mentioned this pull request May 14, 2026

[Bug]: @openclaw/codex notification handlers (account/rateLimits/updated, mcpServer/startupStatus/updated) synchronously block Node event loop #81936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: route plugin LLM completions through Codex runtime#81511

fix: route plugin LLM completions through Codex runtime#81511
jalehman merged 3 commits into
mainfrom
josh/fix-lossless-codex-llm

jalehman commented May 13, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jalehman commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof (required for external PRs)

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Testing

Uh oh!

clawsweeper Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jalehman commented May 13, 2026 •

edited

Loading

clawsweeper Bot commented May 13, 2026 •

edited

Loading