Skip to content

fix: route plugin LLM completions through Codex runtime#81511

Merged
jalehman merged 3 commits into
mainfrom
josh/fix-lossless-codex-llm
May 14, 2026
Merged

fix: route plugin LLM completions through Codex runtime#81511
jalehman merged 3 commits into
mainfrom
josh/fix-lossless-codex-llm

Conversation

@jalehman

@jalehman jalehman commented May 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: plugin LLM completions treated canonical openai/gpt-* model refs as direct OpenAI API calls, even when config mapped those refs to the Codex runtime.
  • Why it matters: context-engine plugins such as lossless-claw could fail LLM summarization under Codex OAuth with a missing OPENAI_API_KEY, then fall back to deterministic truncation.
  • What changed: completion runtime selection now preserves the logical model ref while routing execution through the configured runtime provider when the resolved agent runtime is Codex.
  • What did NOT change: this does not grant plugins new model access or bypass model policy; it only makes plugin LLM completion honor the already-resolved runtime policy.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior addressed: lossless-claw doctor repair needed to use model-backed plugin LLM summarization while the active OpenClaw gateway was authenticated with OpenAI Codex OAuth, not a direct OPENAI_API_KEY.

Real environment tested: Josh's live OpenClaw gateway running this PR head, OpenClaw 2026.5.12-beta.1 (47697a7), Telegram group topic session agent:main:telegram:group:-1003774691294:topic:3731, model openai/gpt-5.5, runtime OpenAI Codex, auth openai-codex:josh@martian.engineering.

Exact steps or command run after this patch: deployed this PR branch locally, rebuilt and restarted the gateway, then ran /lossless doctor apply in the same Telegram/OpenClaw session that had a known fallback LCM summary repair candidate. Verified after the run with lcm_describe sum_22ef5580363377a1 and lcm_grep "LCM fallback summary".

Evidence after fix: before doctor apply, lcm_grep "LCM fallback summary" found recent repair candidate sum_22ef5580363377a1 in this session. After doctor apply, lcm_describe sum_22ef5580363377a1 returned a real summary instead of the fallback marker:

LCM_SUMMARY sum_22ef5580363377a1
meta conv=4754 kind=leaf depth=0 tok=656 descTok=0 srcTok=39920 desc=0 range=2026-05-12 08:14 PDT..2026-05-12 08:42 PDT budgetCap=4000
content
Codex GH auth is now working via symlinked isolated HOME/Keychains: `gh api user` returned `jalehman`, and `gh repo view openclaw/openclaw` returned `openclaw/openclaw MAINTAIN`. Best fix chosen was "make it work" by symlinking Codex HOME `.config/gh` and `Library/Keychains`, not host HOME isolation or launchd `GH_TOKEN`.

A separate issue appeared: `tools.toolSearch` was supposed to be disabled, but the gateway kept serving stale in-memory config...

The live session status for the proof run showed:

OpenClaw 2026.5.12-beta.1 (47697a7)
Model: openai/gpt-5.5
Auth: oauth (openai-codex:josh@martian.engineering)
Runtime: OpenAI Codex
Session: agent:main:telegram:group:-1003774691294:topic:3731

Observed result after fix: doctor apply repaired the fallback summary in place with model-generated summary content and did not hit the previous direct OpenAI auth failure path. The repaired summary no longer contains [LCM fallback summary], and a follow-up grep for LCM fallback summary no longer returned sum_22ef5580363377a1.

What was not tested: no direct OpenAI OPENAI_API_KEY configuration was tested, and no non-Codex agent runtime was changed by this live proof.

Before evidence (optional but encouraged): the pre-fix failure mode from the same environment was lossless summarizer auth failure for direct OpenAI provider traffic: Auth lookup failed for provider "openai": No API key found for provider "openai". You are authenticated with OpenAI Codex OAuth; OpenAI agent model runs use openai/gpt-* through the Codex runtime.

Root Cause (if applicable)

  • Root cause: plugin LLM completion resolved openai/gpt-* as direct provider traffic even when the configured model policy mapped that OpenAI model ref to the Codex agent runtime.
  • Missing detection / guardrail: tests covered direct provider completion preparation but not Codex-runtime-backed OpenAI refs through the plugin runtime.llm.complete path.
  • Contributing context (if known): agent turns already worked with Codex OAuth, so the failure only surfaced inside plugin/context-engine summarization.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/simple-completion-runtime.selection.test.ts
    • src/agents/simple-completion-runtime.test.ts
    • src/plugins/runtime/runtime-llm.runtime.test.ts
  • Scenario the test should lock in: canonical openai/gpt-* refs preserve their logical identity while completion execution routes through Codex when resolved agent runtime policy says codex.
  • Why this is the smallest reliable guardrail: it verifies runtime-provider selection at the completion seam without needing live OAuth credentials in CI.
  • Existing test that already covers this (if any): new/updated tests in this PR cover the selection and preparation behavior.

User-visible / Behavior Changes

Plugin/context-engine LLM summarization now works with canonical OpenAI model refs that are configured to run through Codex OAuth. Users should no longer need to add OPENAI_API_KEY just to make plugin summarization work in a Codex-runtime OpenAI setup.

Diagram (if applicable)

Before:
lossless-claw -> runtime.llm.complete(openai/gpt-5.4-mini) -> direct OpenAI auth -> missing OPENAI_API_KEY -> fallback truncation

After:
lossless-claw -> runtime.llm.complete(openai/gpt-5.4-mini) -> resolved Codex runtime provider -> Codex OAuth -> model-backed summary repair

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local OpenClaw gateway
  • Model/provider: openai/gpt-5.5 and openai/gpt-5.4-mini routed through Codex runtime/OAuth
  • Integration/channel (if any): Telegram group topic
  • Relevant config (redacted): OpenAI model refs configured with Codex agent runtime; lossless-claw summary model uses the OpenAI/Codex-backed runtime path.

Steps

  1. Deploy this PR head and restart the gateway.
  2. Run /lossless doctor apply in a session containing an LCM fallback summary candidate.
  3. Inspect the repaired summary with lcm_describe <summary-id>.
  4. Search for remaining fallback markers with lcm_grep "LCM fallback summary".

Expected

  • Doctor apply invokes the model-backed plugin LLM summarizer through Codex runtime/OAuth.
  • The fallback summary is replaced with a real summary.
  • No direct OpenAI OPENAI_API_KEY auth failure appears.

Actual

  • sum_22ef5580363377a1 was repaired in place and now contains coherent summary text instead of [LCM fallback summary].
  • The active gateway/session was on PR head 47697a7 with OpenAI Codex OAuth.
  • No direct OpenAI API-key auth failure was observed during the repair.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: live lossless doctor summary repair under Codex OAuth on this PR head; targeted unit/seam tests; changed-test and formatting/lint checks.
  • Edge cases checked: OpenRouter model id preservation, runtime-provider selection, registry config compatibility guard.
  • What you did not verify: direct OpenAI API-key mode and non-Codex runtime behavior in a live gateway.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: runtime-provider selection could accidentally change the provider used for non-Codex completion calls.
    • Mitigation: keep the logical model ref separate from the execution provider and cover OpenRouter/direct-provider preservation in tests.

Testing

  • pnpm test src/agents/simple-completion-runtime.selection.test.ts -- --reporter=verbose
    • Passed: 7 tests
  • pnpm test src/agents/simple-completion-runtime.test.ts -- --reporter=verbose
    • Passed: 16 tests
  • pnpm test src/plugins/runtime/runtime-llm.runtime.test.ts -- --reporter=verbose
    • Passed: 17 tests
  • pnpm test:changed
    • Passed: 2 Vitest shards
  • pnpm format:check -- CHANGELOG.md src/agents/simple-completion-runtime.ts src/agents/simple-completion-runtime.test.ts src/agents/simple-completion-runtime.selection.test.ts
    • Passed
  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/simple-completion-runtime.ts src/agents/simple-completion-runtime.test.ts src/agents/simple-completion-runtime.selection.test.ts
    • Passed: 0 warnings, 0 errors
  • git diff --check
    • Passed

Known gap: pnpm check:changed currently fails in untouched src/plugins/registry.runtime-config.test.ts type assertions, unrelated to this diff.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 13, 2026
@clawsweeper

clawsweeper Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The PR routes simple plugin LLM completions through a Codex runtime provider when existing agent runtime policy maps canonical OpenAI model refs to Codex, with focused tests, a changelog entry, and a config-boundary guard update.

Reproducibility: yes. by source inspection: current main’s plugin runtime.llm.complete path delegates to simple-completion preparation, which currently uses the logical provider even though OpenAI agent refs can resolve to the Codex runtime. I did not run a live authenticated current-main failure in this read-only review.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix live gateway output showing a Codex-OAuth /lossless doctor apply run replacing an LCM fallback summary with model-generated summary content on the PR head.

Next step before merge
No ClawSweeper repair job is needed; this protected maintainer-labeled PR has no blocking findings and should proceed through normal review, CI, and merge gates.

Security
Cleared: The diff changes provider selection within existing runtime/model policy and does not add dependencies, workflows, package resolution, secret storage, or new command execution surfaces.

Review details

Best possible solution:

Land the narrow runtime-provider seam after normal maintainer review and CI, keeping logical model refs stable for policy/audit while aligning plugin LLM execution with existing Codex runtime auth.

Do we have a high-confidence way to reproduce the issue?

Yes by source inspection: current main’s plugin runtime.llm.complete path delegates to simple-completion preparation, which currently uses the logical provider even though OpenAI agent refs can resolve to the Codex runtime. I did not run a live authenticated current-main failure in this read-only review.

Is this the best way to solve the issue?

Yes; the PR reuses the existing harness-policy decision and separates logical provider from execution provider, which is narrower than changing plugin auth policy or rewriting public model refs.

What I checked:

  • Protected PR metadata: The provided live GitHub context lists the protected maintainer label, so this cleanup workflow should keep the PR open for maintainer handling rather than close it. (47697a757fea)
  • Current-main root cause: Current main prepares plugin simple-completion models with provider: selection.provider, so canonical openai/gpt-* selections prepare auth/transport against openai instead of the configured Codex runtime provider. (src/agents/simple-completion-runtime.ts:267, 256377c029f6)
  • Plugin LLM call path: runtime.llm.complete delegates to prepareSimpleCompletionModelForAgent, making simple-completion provider selection the path that controls plugin LLM auth and dispatch. (src/plugins/runtime/runtime-llm.runtime.ts:415, 256377c029f6)
  • Runtime policy contract: The documented model/runtime contract says openai/gpt-* agent refs run through Codex by default on the official OpenAI provider, with auth coming from Codex or openai-codex profiles. Public docs: docs/concepts/models.md. (docs/concepts/models.md:26, 256377c029f6)
  • Codex runtime selection source: resolveAgentHarnessPolicy already resolves official OpenAI refs to runtime: "codex" when appropriate; the PR reuses that existing policy instead of adding plugin-specific model policy. (src/agents/harness/policy.ts:17, 256377c029f6)
  • PR implementation: The PR adds runtimeProvider to the selection and switches preparation to selection.runtimeProvider ?? selection.provider, preserving logical audit/model identity while changing execution provider when policy resolves to Codex. (src/agents/simple-completion-runtime.ts:294, 47697a757fea)

Likely related people:

  • @steipete: Local path history for the simple-completion/runtime files is dominated by Peter Steinberger commits, making this a likely routing candidate for the current runtime surface. (role: recent area contributor; confidence: medium; commits: eeef486449, 4862d34925, 943cb47274; files: src/agents/simple-completion-runtime.ts, src/plugins/runtime/runtime-llm.runtime.ts)
  • @jalehman: Merged context-engine/plugin-runtime work introduced and stabilized the surrounding context-engine path that exposes plugin LLM completion; this PR’s proof and affected behavior run through that area. (role: context-engine and plugin-runtime feature contributor; confidence: medium; commits: fee91fefce, 4bfa800cc7, a327b6750d; files: src/context-engine/index.ts, src/context-engine/registry.ts, src/plugins/runtime/index.ts)
  • @pashpashpash: The provided prior ClawSweeper review ties the OpenAI-to-Codex automatic runtime routing policy used by this PR to recent work on the harness policy and OpenAI Codex routing files. (role: runtime-routing feature contributor; confidence: medium; commits: 1c3399010815, 02fe0d8978db; files: src/agents/harness/policy.ts, src/agents/openai-codex-routing.ts)

Remaining risk / open question:

  • I did not rerun the focused tests in this read-only review; the verdict relies on source inspection, the PR diff, and the contributor’s posted live/test output.
  • The live proof covers Codex OAuth and does not exercise a direct OpenAI API-key setup or non-Codex runtime in a real gateway.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 256377c029f6.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime scripts Repository scripts labels May 14, 2026
@jalehman jalehman force-pushed the josh/fix-lossless-codex-llm branch from 6fd7332 to 47697a7 Compare May 14, 2026 00:39
@openclaw-barnacle openclaw-barnacle Bot removed the gateway Gateway runtime label May 14, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026
@jalehman jalehman merged commit aac216d into main May 14, 2026
118 of 124 checks passed
@jalehman jalehman deleted the josh/fix-lossless-codex-llm branch May 14, 2026 04:02
steipete pushed a commit that referenced this pull request May 14, 2026
* fix: route plugin LLM completions through Codex runtime

* fix: preserve OpenRouter completion model ids

* fix: allow registry config compat guards

(cherry picked from commit aac216d)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix: route plugin LLM completions through Codex runtime

* fix: preserve OpenRouter completion model ids

* fix: allow registry config compat guards
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix: route plugin LLM completions through Codex runtime

* fix: preserve OpenRouter completion model ids

* fix: allow registry config compat guards
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix: route plugin LLM completions through Codex runtime

* fix: preserve OpenRouter completion model ids

* fix: allow registry config compat guards
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. scripts Repository scripts size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant