Skip to content

Fix OpenAI compaction runtime routing#85413

Closed
VACInc wants to merge 4 commits into
openclaw:mainfrom
VACInc:fix-openai-compaction-runtime-routing
Closed

Fix OpenAI compaction runtime routing#85413
VACInc wants to merge 4 commits into
openclaw:mainfrom
VACInc:fix-openai-compaction-runtime-routing

Conversation

@VACInc

@VACInc VACInc commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Route embedded OpenAI compaction through the selected runtime provider only when Codex auth is actually configured or explicitly selected.
  • Use the same selected/persisted harness runtime for compaction auth routing and context-window sizing.
  • Align local preflight compaction with configured OpenAI Responses server-compaction thresholds so replies do not stall on surprise local compaction below the server threshold.
  • Add regression coverage for Codex OAuth compaction routing, direct OpenAI API-key preservation, persisted runtime context windows, post-compaction token state, and server-threshold gating.

Root Cause

OpenAI model refs such as openai/gpt-5.5 can run through the Codex harness, but the direct embedded compaction path still mixed raw model provider, selected runtime provider, and context-budget provider decisions. That let budget preflight compaction fall back from stale/missing native Codex thread binding into the context engine, then resolve auth as direct OpenAI API-key mode instead of the selected Codex OAuth route.

A second mismatch made the symptom much slower: local preflight compaction used contextWindow - reserve - softThreshold even when the model params configured OpenAI Responses server compaction at a higher threshold. In the observed local setup that caused preflight compaction around the mid-100k token range even though the configured server threshold was 200k, so Telegram ingress worked but replies appeared stuck before normal inference.

RCA proof:

  • Before behavior: gateway logs showed Telegram ingress followed by native Codex compaction binding fallback, context-engine compaction, and previously No_API_key_found_for_provider_openai on the compaction path.
  • Code evidence: src/agents/pi-embedded-runner/compact.ts resolved compaction model/auth before consistently applying the selected OpenAI runtime provider, and its context-window lookup could use the current policy runtime instead of the persisted session runtime.
  • Code evidence: src/auto-reply/reply/agent-runner-memory.ts preflight gating ignored responsesServerCompaction / responsesCompactThreshold model params when deciding whether to run local compaction.
  • Regression chain: recent compaction changes made required preflight compaction failures visible and made stale/missing native bindings fall through to context-engine compaction, exposing both the auth-route mismatch and the over-eager local preflight gate.

Real behavior proof

Behavior addressed: OpenAI GPT-5.5 compaction under Codex OAuth no longer asks for direct OpenAI API-key auth, direct OpenAI API-key configurations are preserved, and local preflight compaction waits for the configured server threshold instead of blocking replies early.

Real environment tested: local OpenClaw checkout and managed gateway on Linux, Node 25.9.0, configured default model openai/gpt-5.5 with Codex runtime/OAuth.

Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/openai-codex-routing.test.ts src/agents/pi-embedded-runner/compact.hooks.test.ts src/auto-reply/reply/agent-runner-memory.test.ts; pnpm build; oc-update --skip-codex; pnpm openclaw status --all.

Evidence after fix: focused Vitest reported Test Files 4 passed (4) and Tests 131 passed (131); build completed successfully; oc-update --skip-codex rebuilt core/UI, reinstalled the daemon, and health check passed on retry; status reported Telegram OK and the gateway reachable.

Observed result after fix: the compaction unit path routes configured Codex OAuth sessions through openai-codex, keeps API-key-only OpenAI sessions on openai, uses the persisted Codex runtime for context-window sizing, and skips local preflight compaction below the configured OpenAI Responses server threshold.

What was not tested: no new live Telegram message was sent through the production topic after the patch; live status confirms Telegram/gateway readiness, while behavior proof is targeted unit coverage plus local build/update/status.

Verification

  • node scripts/run-vitest.mjs src/agents/openai-codex-routing.test.ts src/agents/pi-embedded-runner/compact.hooks.test.ts src/auto-reply/reply/agent-runner-memory.test.ts
  • pnpm build
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main found two accepted P2 issues; both were patched and covered by regression tests. Per user request, no further autoreview was run after the final API-key preservation patch.
  • oc-update --skip-codex
  • pnpm openclaw status --all

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Latest ClawSweeper review: 2026-05-22 17:00 UTC / May 22, 2026, 1:00 PM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR changes OpenAI/Codex compaction provider selection, persisted runtime context-window selection, hook scheduling, provider auth prewarm defaults, /status provider-usage loading, and related regression tests.

Reproducibility: yes. from source inspection, but not from a live oversized compaction run here. Current main routes the preflight gate through local context/reserve threshold math and does not consult the configured OpenAI Responses server-compaction threshold.

PR rating
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🧂 unranked krab
Summary: The PR has useful tests and direction, but the server-threshold implementation gap plus missing real auth-route proof make it not quality-ready yet.

Rank-up moves:

  • Implement the production responsesCompactThreshold/server-compaction gate and ensure the new test fails on current main.
  • Add redacted terminal logs or live output from an oversized OpenAI/Codex compaction run showing openai-codex auth is used after the fix.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body reports focused tests, build, update, and status checks, but does not show redacted after-fix output from a real oversized OpenAI/Codex compaction run exercising the changed auth route. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • The PR body still lacks redacted after-fix output from a real oversized OpenAI/Codex compaction run showing compaction authenticates through openai-codex instead of falling back to direct OpenAI API-key auth.
  • The diff changes provider/auth selection for compaction, so mistakes can affect Codex OAuth, direct OpenAI API-key sessions, or persisted runtime context sizing.
  • The PR also disables provider auth prewarm by default unless OPENCLAW_ENABLE_PROVIDER_AUTH_PREWARM is truthy, which changes existing auth-startup behavior and should be an explicit maintainer choice if retained.

Maintainer options:

  1. Fix threshold gating and require auth-route proof (recommended)
    Patch the production preflight threshold logic first, then require redacted oversized compaction output showing openai-codex auth succeeds and API-key-only OpenAI sessions remain direct.
  2. Accept with maintainer-run verification
    A maintainer with Codex OAuth can intentionally own the auth-provider risk by running and recording equivalent oversized compaction proof before merge.
  3. Split unrelated startup/default changes
    If provider auth prewarm defaulting is not part of this bug fix, pause or split that behavior into a separate maintainer-reviewed PR.

Next step before merge
There is a narrow mechanical repair: wire the configured OpenAI Responses server-compaction threshold into runPreflightCompactionIfNeeded before this PR can be considered correct.

Security
Cleared: No dependency, workflow, lockfile, secret-storage, or external code-execution change was found; auth-routing behavior is tracked as merge risk.

Review findings

  • [P2] Implement the server threshold gate in production — src/auto-reply/reply/agent-runner-memory.test.ts:944-967
Review details

Best possible solution:

Land only after the production preflight gate honors configured OpenAI Responses server thresholds, focused tests fail on current main and pass on the branch, and redacted real compaction proof confirms Codex OAuth and direct OpenAI API-key routes still behave correctly.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, but not from a live oversized compaction run here. Current main routes the preflight gate through local context/reserve threshold math and does not consult the configured OpenAI Responses server-compaction threshold.

Is this the best way to solve the issue?

No, not yet. The OpenAI/Codex auth-routing direction looks maintainable, but the patch is incomplete until the production preflight gate uses the configured server threshold and the auth-sensitive path has real proof.

Label changes:

  • add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🧂 unranked krab, and The PR has useful tests and direction, but the server-threshold implementation gap plus missing real auth-route proof make it not quality-ready yet.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P2: This is a focused agent runtime/auth routing bugfix with limited surface area but real impact on OpenAI/Codex compaction workflows.
  • merge-risk: 🚨 auth-provider: The PR changes provider/auth routing for compaction and provider auth prewarm defaults, which can affect Codex OAuth and direct OpenAI API-key behavior.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🧂 unranked krab, and The PR has useful tests and direction, but the server-threshold implementation gap plus missing real auth-route proof make it not quality-ready yet.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body reports focused tests, build, update, and status checks, but does not show redacted after-fix output from a real oversized OpenAI/Codex compaction run exercising the changed auth route. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Full review comments:

  • [P2] Implement the server threshold gate in production — src/auto-reply/reply/agent-runner-memory.test.ts:944-967
    The new test config sets responsesCompactThreshold: 200_000 and expects a 161k-token session not to compact, but runPreflightCompactionIfNeeded still computes the gate as contextWindowTokens - reserveTokensFloor - softThresholdTokens and never reads these model params. With the default 200k context window plus the test's 60k reserve and 4k soft threshold, production still triggers compaction around 136k, so the early preflight compaction path remains unfixed.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

Acceptance criteria:

  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-memory.test.ts src/agents/openai-codex-routing.test.ts src/agents/pi-embedded-runner/compact.hooks.test.ts

What I checked:

  • Current preflight gate still uses local threshold math: Current main computes preflight compaction from contextWindowTokens - reserveTokensFloor - softThresholdTokens and calls shouldRunPreflightCompaction with those values; this path does not read responsesServerCompaction or responsesCompactThreshold. (src/auto-reply/reply/agent-runner-memory.ts:696, a0358bbf1857)
  • PR test asserts server-threshold behavior: The PR adds a test with responsesServerCompaction: true and responsesCompactThreshold: 200_000, then expects a 161,077-token session not to compact, but no matching production change appears in the PR diff for agent-runner-memory.ts. (src/auto-reply/reply/agent-runner-memory.test.ts:919, f0260149d3f6)
  • Configured Responses threshold lives in model extra params: The OpenAI Responses payload policy already reads responsesCompactThreshold from prepared extra params, and extra params are sourced from agents.defaults.models[provider/model].params, so the production preflight path has a concrete config source it can reuse. (src/agents/openai-responses-payload-policy.ts:341, a0358bbf1857)
  • PR changes auth-sensitive provider routing: The PR head changes resolveSelectedOpenAIPiRuntimeProvider so Codex runtime only routes OpenAI through openai-codex when a Codex auth profile is configured or selected. (src/agents/openai-codex-routing.ts:95, f0260149d3f6)
  • Provider auth prewarm default changes: Current main passes provider auth prewarm config without enabled, which means startup post-attach schedules it unless explicitly disabled; the PR head changes server.impl.ts to enable it only when OPENCLAW_ENABLE_PROVIDER_AUTH_PREWARM is truthy. (src/gateway/server.impl.ts:1579, a0358bbf1857)
  • History points to compaction and memory-gate owners: A shallow/current-history pass shows the central compaction and memory preflight files have recent work by Peter Steinberger, Vincent Koc, Tak Hoffman, and openperf, with Peter having the largest shortlog count across the sampled files. (src/auto-reply/reply/agent-runner-memory.ts:617, a0358bbf1857)

Likely related people:

  • Peter Steinberger: Recent history shows repeated work on memory flush policy, embedded compaction tests, status/runtime behavior, and the largest sampled shortlog count across the central files. (role: recent area contributor; confidence: high; commits: e510042870cf, c90cb9c3c95c, 37625cff6fff; files: src/auto-reply/reply/agent-runner-memory.ts, src/auto-reply/reply/memory-flush.ts, src/agents/pi-embedded-runner/compact.ts)
  • Vincent Koc: Recent history includes the queued embedded compaction wrapper split and several runtime seam/cycle refactors touching the sampled compaction area. (role: adjacent runtime contributor; confidence: medium; commits: f1c4e2f11daf, 74e7b8d47b18, 61da711b1a7e; files: src/agents/pi-embedded-runner/compact.ts, src/auto-reply/reply/agent-runner-memory.ts)
  • Tak Hoffman: Recent history includes context-window tightening and compaction sender identity fixes in the same runtime and memory-pressure area. (role: adjacent compaction/context contributor; confidence: medium; commits: 4f00b769251d, cfae8fd1e927; files: src/agents/pi-embedded-runner/compact.ts, src/auto-reply/reply/agent-runner-memory.ts)
  • openperf: Current shallow blame for the preflight threshold and compaction routing lines points to commit 19ff77e, so this account is relevant for the latest main snapshot even though deeper provenance is shared. (role: recent line owner in shallow checkout; confidence: medium; commits: 19ff77e9c9fb; files: src/auto-reply/reply/agent-runner-memory.ts, src/agents/pi-embedded-runner/compact.ts, src/agents/openai-codex-routing.ts)
  • clawsweeper[bot]: The provider auth prewarm behavior changed recently in commit 69255f8, which is relevant because this PR changes that default again. (role: recent adjacent auth-startup contributor; confidence: medium; commits: 69255f8f328d; files: src/gateway/server.impl.ts, src/gateway/server-startup-post-attach.ts, src/agents/model-provider-auth.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a0358bbf1857.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@clawsweeper clawsweeper Bot added P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. labels May 22, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 22, 2026
@VACInc VACInc closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex gateway Gateway runtime merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: L status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant