Skip to content

Fix live model inference edge cases#88946

Merged
steipete merged 30 commits into
mainfrom
inference
Jun 2, 2026
Merged

Fix live model inference edge cases#88946
steipete merged 30 commits into
mainfrom
inference

Conversation

@steipete

@steipete steipete commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • let CLI providers return the configured silent no-reply payload on clean empty output when the caller already allows silent replies, avoiding wrong-model fallback reruns
  • send explicit typed Responses input message items with input_text content so Azure AI Foundry project endpoints accept shared openai-responses payloads
  • reject HTML/non-JSON custom OpenAI-compatible verification responses and fail streamed 200 HTML runtime responses with a baseUrl /v1 hint
  • fix ACP/ACPX startup model propagation, including configured agent primary models and sessionOptions handoff
  • fix explicit model alias resolution before capability/thinking validation
  • keep thinking fallback downgrades turn-local so stored explicit session thinkingLevel overrides survive replies/runs
  • fix Codex nano/API-key runs by avoiding unsupported tool_search paths, including Codex v1 multi-agent deferral
  • fix Google simple-completion thinking payloads and Gemma 4 shorthand normalization
  • avoid showing configured fallback chains as active fallbacks when a session-selected model is pinned
  • clear stale pending live-model-switch state when sessions.patch resets the model pin
  • fix macOS Voice Wake/PTT/Talk Mode sends to inherit session thinking unless the UI selects an override
  • support custom OpenAI Responses-compatible onboarding for endpoints that expose /responses but not /chat/completions
  • align OpenAI Codex OAuth model selection so legacy codex/ primaries move to the selected canonical openai/ allowlist entry
  • fail closed when Codex native tool calls finish without matching tool results so trajectories keep durable failure proof
  • route baseUrl-only google-vertex providers through native Vertex streamGenerateContent while preserving explicit OpenAI-compatible Vertex endpoints
  • recover complete DeepSeek DSML tool-call text on OpenAI-compatible completions streams, including split chunks, without executing malformed DSML
  • backfill Azure/OpenAI Responses completed output items when providers send the final response without per-item stream events
  • preserve unescaped Windows path segments in streamed tool-call JSON arguments instead of decoding control characters
  • recover cron add/update tool-call parameters when local model parsers merge adjacent JSON property names
  • keep loopback MCP native-tool dedup exclusions out of inherited tool deny policy for claude-cli sessions
  • strip inbound metadata/delivery scaffolding from outbound message.send text and suppress metadata-only sends before channel dispatch
  • update command tests for the current model-selection resolver seam

Fixes #85806.
Fixes #83810.
Fixes #74305.
Fixes #87381.
Fixes #87740.
Fixes #84688.
Fixes #63685.
Fixes #88039.
Fixes #83192.
Fixes #87768.
Fixes #44870.
Fixes #88456.
Fixes #86808.
Fixes #84804.
Fixes #84697.
Fixes #84109.
Fixes #89008.
Fixes #85918.
Fixes #88833.
Fixes #88918.
Fixes #88439.
Fixes #89242.
Fixes #89241.

Partially addresses #89100 (FM-3 outbound scaffolding leak; FM-2 group target routing remains open).

Verification

  • node scripts/run-vitest.mjs src/llm/utils/json-parse.test.ts

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -t "Azure Responses completed"

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -t "DeepSeek DSML"

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -t "tool calls"

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/cli-runner.ts src/agents/cli-runner/types.ts src/agents/cli-runner.reliability.test.ts

  • node scripts/run-vitest.mjs src/agents/cli-runner.before-agent-reply-cron.test.ts src/agents/cli-runner.context-engine.test.ts src/agents/cli-runner.reliability.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/openai-transport-stream.ts src/agents/openai-transport-stream.test.ts

  • node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts extensions/microsoft-foundry/index.test.ts src/agents/openai-responses-payload-policy.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/gateway/sessions-patch.ts src/gateway/sessions-patch.test.ts src/commands/onboard-custom.ts src/commands/onboard-custom.test.ts src/agents/provider-transport-fetch.ts src/agents/provider-transport-fetch.test.ts

  • node scripts/run-vitest.mjs src/gateway/sessions-patch.test.ts src/agents/live-model-switch.test.ts src/commands/onboard-custom.test.ts src/commands/onboard-custom-config.test.ts src/agents/provider-transport-fetch.test.ts

  • node scripts/run-vitest.mjs src/agents/agent-command.live-model-switch.test.ts src/agents/acp-spawn.test.ts src/agents/google-simple-completion-stream.test.ts src/agents/simple-completion-transport.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/status.test.ts extensions/acpx/src/runtime.test.ts extensions/google/model-id.test.ts extensions/google/provider-models.test.ts packages/model-catalog-core/src/provider-model-id-normalization.test.ts packages/model-catalog-core/src/provider-model-id-normalize.test.ts extensions/codex/src/app-server/dynamic-tool-build.test.ts extensions/codex/src/app-server/thread-lifecycle.binding.test.ts extensions/codex/src/app-server/thread-lifecycle.test.ts

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/dynamic-tool-build.test.ts extensions/codex/src/app-server/thread-lifecycle.test.ts extensions/codex/src/app-server/thread-lifecycle.binding.test.ts extensions/codex/src/app-server/run-attempt.test.ts

  • node scripts/run-vitest.mjs src/gateway/sessions-patch.test.ts src/agents/live-model-switch.test.ts

  • node scripts/run-vitest.mjs src/commands/agent.test.ts

  • node scripts/run-vitest.mjs src/commands/onboard-custom-config.test.ts src/commands/onboard-custom.test.ts src/commands/onboard-non-interactive/local/auth-choice.test.ts

  • node scripts/run-vitest.mjs src/commands/configure.gateway-auth.prompt-auth-config.test.ts src/commands/model-picker.test.ts

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/event-projector.test.ts extensions/codex/src/app-server/run-attempt.test.ts

  • node scripts/run-vitest.mjs extensions/google/api.test.ts extensions/google/provider-registration.test.ts extensions/google/index.test.ts src/agents/embedded-agent-runner/model.test.ts

  • pnpm exec oxfmt --check extensions/codex/src/app-server/event-projector.ts extensions/codex/src/app-server/event-projector.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.extensions.json extensions/codex/src/app-server/event-projector.ts extensions/codex/src/app-server/event-projector.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.extensions.json extensions/google/api.ts extensions/google/provider-policy.ts extensions/google/provider-registration.ts extensions/google/api.test.ts extensions/google/provider-registration.test.ts extensions/google/index.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/embedded-agent-runner/model.provider-runtime.test-support.ts src/agents/embedded-agent-runner/model.test.ts

  • node scripts/run-tsgo.mjs -p tsconfig.extensions.json --incremental false

  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental false

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/commands/configure.gateway-auth.ts src/commands/configure.gateway-auth.prompt-auth-config.test.ts && git diff --check

  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental false

  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental false

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/agent-command.ts src/commands/agent-command.test-mocks.ts src/commands/agent.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/cli/program/register.onboard.ts src/commands/onboard-custom-config.ts src/commands/onboard-custom.ts src/commands/onboard-types.ts src/wizard/i18n/locales/en.ts src/wizard/i18n/locales/zh-CN.ts src/wizard/i18n/locales/zh-TW.ts src/commands/onboard-custom-config.test.ts src/commands/onboard-custom.test.ts src/commands/onboard-non-interactive/local/auth-choice.test.ts

  • pnpm docs:list

  • swift test --package-path apps/macos --filter VoiceWakeForwarderTests --filter TalkModeRuntimeSpeechTests --filter GatewayConnectionControlTests -> Swift Testing selected suites: 12 tests passed

  • isolated live API-key Codex harness: temp HOME/CODEX_HOME, CODEX_API_KEY from exported OPENAI_API_KEY, OPENCLAW_LIVE_CODEX_HARNESS_MODEL=openai/gpt-5.4-nano, node scripts/test-live.mjs --codex-harness -- src/gateway/gateway-codex-harness.live.test.ts -> 1 passed, 1 skipped, 154.60s

  • git diff --check origin/main...HEAD

  • node scripts/run-vitest.mjs src/agents/tools/cron-tool.test.ts

  • node scripts/run-vitest.mjs src/gateway/tool-resolution.exclude.test.ts src/gateway/tool-resolution.test.ts

  • node scripts/run-vitest.mjs src/agents/tools/sessions-spawn-tool.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/tools/cron-tool-canonicalize.ts src/agents/tools/cron-tool.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/gateway/tool-resolution.ts src/gateway/tool-resolution.exclude.test.ts src/gateway/tool-resolution.test.ts src/agents/tools/sessions-spawn-tool.test.ts

  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental false

  • git diff --check

  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode local -> clean after accepted fixes

  • node scripts/run-vitest.mjs src/agents/tools/cron-tool.test.ts

  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/tools/cron-tool-canonicalize.ts src/agents/tools/cron-tool.test.ts

  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental false

  • git diff --check

  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode local -> clean after accepted fixes

Note: local full extension test-type graph crashed inside typescript-go with a Go SIGSEGV before TypeScript diagnostics; core prod/test typechecks covering the touched files passed.

  • node scripts/run-vitest.mjs src/agents/tools/message-tool.test.ts src/infra/outbound/message-action-normalization.test.ts src/infra/outbound/message-action-runner.send-validation.test.ts src/auto-reply/reply/strip-inbound-meta.test.ts
  • node scripts/run-vitest.mjs src/infra/outbound/message-action-runner.core-send.test.ts
  • pnpm exec oxfmt --check --threads=1 src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts src/auto-reply/reply/strip-inbound-meta.ts
  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts src/auto-reply/reply/strip-inbound-meta.ts src/auto-reply/reply/strip-inbound-meta.test.ts
  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode local -> clean: no accepted/actionable findings reported

@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed June 1, 2026, 9:49 PM ET / 01:49 UTC.

Summary
The PR updates live model inference across provider routing, Responses payload/stream handling, Codex and ACPX runtime paths, session model/thinking state, macOS voice sends, custom onboarding, cron/message tools, docs, and tests.

PR surface: Source +823, Tests +1592, Docs +1, Other +67. Total +2483 across 76 files.

Reproducibility: yes. Source inspection of the latest PR head shows the Codex side-question path still uses model-unaware dynamic-tool loading, the Responses backfill still skips after prior reasoning content, Google routing still overrides explicit Generative AI API choices under google-vertex, and model reset clears the only live-switch flag.

Review metrics: 2 noteworthy metrics.

  • Public compatibility mode: 1 added (openai-responses). The new --custom-compatibility value changes the public onboarding/config contract and needs compatibility review before merge.
  • Known broad typecheck gap: 1 extension test-type graph crash reported. The PR body says the full extension test-type graph crashed before TypeScript diagnostics, so broad extension type coverage is not proven.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦞 diamond lobster
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Fix the four review findings with focused regression tests.
  • Refresh or replace the broad extension typecheck proof that crashed inside typescript-go.
  • Get maintainer owner review for the custom provider compatibility, auth/provider routing, and session-state upgrade behavior.

Risk before merge

  • [P1] Merging as-is can still fail Codex gpt-5.4-nano side-question threads because /btw dynamic tools keep the generic searchable loading mode.
  • [P1] Merging as-is can drop a final Responses assistant message or function call when a reasoning block arrives before a completed-response-only output item.
  • [P1] Merging as-is can route explicit google-generative-ai configs under a google-vertex provider through the Vertex transport, breaking existing non-Vertex Gemini/API Studio style setups.
  • [P1] Merging as-is can leave an active session running on the old pinned model after sessions.patch resets persisted selection to the default.
  • [P1] The PR adds a public custom-provider compatibility mode and changes provider/auth/session behavior across many surfaces, so owner review and upgrade proof remain important even after line-level fixes.
  • [P1] The PR body reports that the full extension test-type graph crashed inside typescript-go with a Go SIGSEGV before diagnostics, leaving a broad extension type-coverage gap.

Maintainer options:

  1. Fix the remaining runtime blockers (recommended)
    Apply the model-aware Codex loading, Responses backfill, Google transport, and live-switch reset repairs with focused regressions before merge.
  2. Require owner upgrade review
    After the line-level fixes, have maintainers explicitly review the custom provider compatibility mode, Google routing behavior, OpenAI/Codex auth selection, and session reset upgrade semantics.
  3. Split the safest fixes
    If the broad PR cannot converge quickly, pause this branch and split the already-proven narrow bug fixes into smaller owner-scoped PRs.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Fix the current review findings in `extensions/codex/src/app-server/side-question.ts`, `src/agents/openai-transport-stream.ts`, `extensions/google/provider-registration.ts`, and `src/gateway/sessions-patch.ts`; add or update focused regression tests for each path; do not broaden the PR beyond these repairs.

Next step before merge

  • The remaining blockers are concrete file-level repairs an automated worker can attempt, but the protected label and broad provider/session surface still require maintainer review afterward.

Security
Cleared: The diff touches provider fetch/config behavior but does not add dependency, workflow, secret, package-resolution, or supply-chain changes with a concrete security regression.

Review findings

  • [P2] Apply nano tool loading to side questions — extensions/codex/src/app-server/run-attempt.ts:598
  • [P1] Backfill completed items after reasoning blocks — src/agents/openai-transport-stream.ts:1492-1493
  • [P1] Honor explicit Generative AI configs before Vertex fallback — extensions/google/provider-registration.ts:73-75
Review details

Best possible solution:

Fix the four concrete runtime blockers, keep the new custom OpenAI Responses mode documented as a public compatibility addition, and require maintainer review/upgrade proof for the provider, auth, and session-state behavior before merge.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection of the latest PR head shows the Codex side-question path still uses model-unaware dynamic-tool loading, the Responses backfill still skips after prior reasoning content, Google routing still overrides explicit Generative AI API choices under google-vertex, and model reset clears the only live-switch flag.

Is this the best way to solve the issue?

No. The PR contains useful fixes and strong selected live proof, but it is not the best complete fix until the one-sided Codex path, completed-output backfill guard, Google transport precedence, and model-reset live-switch behavior are corrected.

Full review comments:

  • [P2] Apply nano tool loading to side questions — extensions/codex/src/app-server/run-attempt.ts:598
    The main Codex run now switches gpt-5.4-nano to direct dynamic tools, but /btw side-question threads still build their bridge with resolveCodexDynamicToolsLoading(input.pluginConfig) in side-question.ts. That path can still expose deferred tool_search behavior for the same nano model this PR marks as unable to use tool search, so nano sessions can pass normal turns and fail when the user asks a side question.
    Confidence: 0.88
  • [P1] Backfill completed items after reasoning blocks — src/agents/openai-transport-stream.ts:1492-1493
    This guard skips completed-output backfill as soon as any content block exists. If a stream emits a reasoning item before response.completed.response.output supplies the actual assistant message or function call, output.content.length is already nonzero and the final deliverable item is never appended.
    Confidence: 0.9
  • [P1] Honor explicit Generative AI configs before Vertex fallback — extensions/google/provider-registration.ts:73-75
    This branch routes every google-vertex provider model with api: "google-generative-ai" through the Vertex transport solely because of the provider id. Configs that explicitly preserved the Generative AI API with an AI Studio or proxy base URL will now send Vertex-shaped requests to a non-Vertex endpoint.
    Confidence: 0.86
  • [P1] Keep model resets pending until active runs reconcile — src/gateway/sessions-patch.ts:534
    Deleting liveModelSwitchPending on model: null means an active run that is still on the old pinned model will not be reconciled, because shouldSwitchToLiveModel returns early when the flag is absent. Current docs say user-driven sessions.patch model changes mark a pending live switch, so clearing the flag here can leave persisted default selection and active runtime selection diverged.
    Confidence: 0.84

Overall correctness: patch is incorrect
Overall confidence: 0.87

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b06dc1753765.

Label changes

Label justifications:

  • P2: The PR addresses normal-priority live model inference bugs across several surfaces, but it is not an emergency outage or security fix.
  • merge-risk: 🚨 compatibility: The diff adds a public custom-provider compatibility mode and changes provider routing behavior that can affect existing configs and upgrades.
  • merge-risk: 🚨 auth-provider: The diff changes OpenAI/Codex OAuth model selection plus Google/Azure provider transport selection.
  • merge-risk: 🚨 session-state: The diff changes persisted thinking overrides and live model switch session-state handling.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦞 diamond lobster and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (logs): The PR includes after-fix focused command output plus live Codex harness proof and live credentialed Azure Foundry canaries, though the remaining findings still need targeted regression proof after repair.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR includes after-fix focused command output plus live Codex harness proof and live credentialed Azure Foundry canaries, though the remaining findings still need targeted regression proof after repair.
Evidence reviewed

PR surface:

Source +823, Tests +1592, Docs +1, Other +67. Total +2483 across 76 files.

View PR surface stats
Area Files Added Removed Net
Source 39 999 176 +823
Tests 30 1624 32 +1592
Docs 2 2 1 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 5 74 7 +67
Total 76 2699 216 +2483

Acceptance criteria:

  • [P1] node scripts/run-vitest.mjs extensions/codex/src/app-server/side-question.test.ts extensions/codex/src/app-server/dynamic-tool-build.test.ts extensions/codex/src/app-server/thread-lifecycle.test.ts.
  • [P1] node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts -t "Azure Responses completed".
  • [P1] node scripts/run-vitest.mjs extensions/google/provider-registration.test.ts extensions/google/api.test.ts src/agents/embedded-agent-runner/model.test.ts.
  • [P1] node scripts/run-vitest.mjs src/gateway/sessions-patch.test.ts src/agents/live-model-switch.test.ts.
  • [P1] node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental false.

What I checked:

  • Repository policy applied: Root AGENTS.md and relevant scoped guides for extensions, ACPX, agents, agent tools, gateway, gateway server methods, outbound helpers, and docs were read; the provider/session/config/Codex compatibility review rules apply. (AGENTS.md:21, b06dc1753765)
  • Protected label: The supplied live PR context includes the protected maintainer label, so conservative cleanup must keep the PR open for maintainer handling. (8c8e400f9c21)
  • Codex nano fix is one-sided: Latest PR head uses the new model-aware dynamic-tool loading resolver in the main Codex run path, but the side-question path still calls the generic resolver without the active model id. (extensions/codex/src/app-server/side-question.ts:602, 8c8e400f9c21)
  • Codex dependency contract checked: Codex upstream exposes deferred dynamic tools through tool_search only when the model supports search tools, so model-aware disabling must cover every OpenClaw Codex thread path. (../codex/codex-rs/core/src/tools/spec_plan.rs:275, c955f730781d)
  • Responses backfill guard still skips mixed streams: Latest PR head returns from completed-output backfill whenever any prior content exists, so a reasoning item added before response.completed.response.output still suppresses the final assistant message or function call. (src/agents/openai-transport-stream.ts:1492, 8c8e400f9c21)
  • Google transport override remains too broad: Latest PR head routes api: "google-generative-ai" models through Vertex when model.provider === "google-vertex", even if the explicit API choice and base URL require the Generative AI transport. (extensions/google/provider-registration.ts:74, 8c8e400f9c21)

Likely related people:

  • steipete: Recent current-main commits touch OpenAI Responses, Google Vertex/provider routing, session goals/state, and this PR branch also spans the same provider/session surfaces. (role: recent area contributor; confidence: high; commits: b23ace1d04ca, fba9eac7ebb7, 00d8d7ead059; files: src/agents/openai-transport-stream.ts, extensions/google/provider-registration.ts, src/gateway/sessions-patch.ts)
  • joshavant: Recent current-main Codex app-server commits touched native surfaces, sandbox execution, and missing turn completion around the same side-thread/runtime area. (role: adjacent Codex app-server owner; confidence: medium; commits: e0405ecc9bd6, ba06376c7955, 7cda26aa6c72; files: extensions/codex/src/app-server/side-question.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
  • udaymanish6: GitHub path history shows recent work on Codex side-question timeout behavior, which is adjacent to the remaining side-question dynamic-tool-loading gap. (role: recent Codex side-thread contributor; confidence: medium; commits: 0f18d52f16e3; files: extensions/codex/src/app-server/side-question.ts)
  • latensified: Recent OpenAI Responses stream work changed replay/id handling in the same transport file where completed-output backfill is being added. (role: recent OpenAI Responses contributor; confidence: medium; commits: 6653193fdb90; files: src/agents/openai-transport-stream.ts)
  • 1052326311: Recent current-main commits touched Google provider default API routing and gateway session patch auth-profile handling near two compatibility-sensitive surfaces in this PR. (role: recent provider/session contributor; confidence: medium; commits: b73e135f9730, 152f68d037af; files: extensions/google/provider-registration.ts, src/gateway/sessions-patch.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime app: macos App: macos labels Jun 1, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot added the commands Command implementations label Jun 1, 2026
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 1, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c653fb3f4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}),
markLiveSwitchPending: true,
});
delete next.liveModelSwitchPending;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep model resets pending for live switches

When sessions.patch resets model to null while an agent is already running on a user-pinned non-default model, this clears the only flag that shouldSwitchToLiveModel checks before reconciling the active run. The persisted selection changes back to the configured default, but the active runner will not restart/switch because liveModelSwitchPending is deleted here; this contradicts the documented live-switch contract that user-driven sessions.patch model changes mark a pending live switch (docs/concepts/model-failover.md:342).

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle Bot added the docs Improvements or additions to documentation label Jun 1, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 2, 2026
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels Jun 2, 2026
@pigfoot

pigfoot commented Jun 2, 2026

Copy link
Copy Markdown

I ran the missing live Azure credentialed canary on the latest head of this PR.

Setup:

  • PR: Fix live model inference edge cases #88946
  • Head tested: 810ef28557edcd7196e31b60a680ce03f817d71d
  • Runtime: isolated checkout under /tmp, isolated OPENCLAW_HOME / OPENCLAW_STATE_DIR / OPENCLAW_CONFIG_PATH
  • Production config was not modified.
  • Target: Azure Foundry/resource /openai/v1 Responses endpoint, redacted host
  • OpenClaw provider config shape:
    • provider/model: azure/gpt-5.5
    • api: azure-openai-responses
    • base URL shape: ${AZURE_FOUNDRY_BASE_URL}/openai/v1
    • credential source: ${AZURE_FOUNDRY_API_KEY}

No-tools canary:

prompt: Return exactly: AZURE_RESPONSES_CANARY_OK
final text: AZURE_RESPONSES_CANARY_OK
assistantTexts: ["AZURE_RESPONSES_CANARY_OK"]
winner: azure/gpt-5.5
fallbackUsed: false
stopReason: stop
durationMs: 53545

Tool-continuation canary:

prompt: First call the get_goal tool exactly once. After the tool result, return exactly: AZURE_RESPONSES_TOOL_CANARY_OK
final text: AZURE_RESPONSES_TOOL_CANARY_OK
assistantTexts: ["AZURE_RESPONSES_TOOL_CANARY_OK"]
toolSummary: { calls: 1, tools: ["get_goal"], failures: 0 }
winner: azure/gpt-5.5
fallbackUsed: false
stopReason: stop
durationMs: 22122

I also scanned the canary artifacts and session trajectory logs for the prior failure signatures:

negative-check: clean

The scan found no non_deliverable_terminal_turn, no FailoverError, no candidate_failed, no /v1 api-version rejection, and no Foundry Invalid value: '' typed-message rejection in these runs.

This is live credentialed Azure proof from the PR head that both a direct assistant-text turn and a tool-call continuation produce deliverable assistant text through azure-openai-responses without fallback.

@steipete

steipete commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Land-ready proof for head e02e62c6516a243b51bb6aeace78e906501ff546.

Local/source proof:

  • node scripts/run-vitest.mjs src/gateway/tool-resolution.exclude.test.ts src/gateway/tool-resolution.test.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental false
  • node scripts/run-vitest.mjs src/agents/tools/message-tool.test.ts src/infra/outbound/message-action-normalization.test.ts src/infra/outbound/message-action-runner.send-validation.test.ts src/auto-reply/reply/strip-inbound-meta.test.ts
  • node scripts/run-vitest.mjs src/infra/outbound/message-action-runner.core-send.test.ts
  • pnpm exec oxfmt --check --threads=1 src/gateway/tool-resolution.exclude.test.ts src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts src/auto-reply/reply/strip-inbound-meta.ts
  • node scripts/run-oxlint.mjs --tsconfig config/tsconfig/oxlint.core.json src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts src/auto-reply/reply/strip-inbound-meta.ts src/auto-reply/reply/strip-inbound-meta.test.ts
  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental false
  • git diff --check
  • Autoreview rerun after the message-tool sanitizer fix: no accepted/actionable findings.

Live/provider proof:

CI:

  • GitHub status rollup for this exact head reports no failed checks and no pending checks.

Known gap:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: macos App: macos cli CLI command changes commands Command implementations docs Improvements or additions to documentation extensions: acpx extensions: codex extensions: google gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: XL status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

2 participants