Skip to content

fix(agents): re-probe single-provider primary during cooldown#90717

Merged
sallyom merged 1 commit into
openclaw:mainfrom
849261680:fix/90702-single-provider-cooldown-reprobe
Jun 5, 2026
Merged

fix(agents): re-probe single-provider primary during cooldown#90717
sallyom merged 1 commit into
openclaw:mainfrom
849261680:fix/90702-single-provider-cooldown-reprobe

Conversation

@849261680

@849261680 849261680 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #90702

When fallbacks: [], a rate/subscription-limited primary stayed suspended until the provider-reported blockedUntil timestamp literally arrived — which can be days out for subscription caps (e.g. "Next reset in 6 days"). The root cause: shouldProbePrimaryDuringCooldown returned false on !hasFallbackCandidates before checking any recovery condition, so a single-provider agent went silent for days even after the rolling cap recovered.

What changed

  • Split the early-return guard in shouldProbePrimaryDuringCooldown (src/agents/model-fallback.ts): a non-primary candidate still returns false immediately, but a single-provider primary (!hasFallbackCandidates) now returns true (probe allowed) while the 30-second throttle slot is open. A single-provider primary has no fallback chain to prefer, so "is the primary callable yet?" is a recovery question independent of fallback configuration.
  • Simplified the billing branch in resolveCooldownDecision: removed the duplicated single-provider probe check (shouldProbeSingleProviderBilling); the unified shouldProbe now covers single-provider recovery for all reasons including billing. Net: one fewer special-case branch.
  • Updated the existing case that asserted the primary was never invoked with fallbacks: [] + rate-limit cooldown — it now expects the primary to be probed and succeed.
  • Added a regression case proving a far-future subscription_limit block with fallbacks: [] yields a probe attempt rather than suspend_lanes, while the 30-second probe throttle is still honored.

Multi-fallback setups are unchanged: they keep preferring the fallback chain and only probe the primary near cooldown expiry.

Verification

Focused suites (src/agents/model-fallback.probe.test.ts 14, model-fallback.test.ts 76, auth-profiles.cooldown-auto-expiry.test.ts 6) pass; tsgo -p tsconfig.core.json exits 0; oxfmt --check clean. These are supplemental to the runtime proof below.

Real behavior proof

Behavior addressed: With a single configured provider (fallbacks: []) whose only auth profile carries a far-future subscription_limit block, the model-fallback layer now re-probes the primary and serves the recovered call, instead of suspending lanes until blockedUntil.

Real environment tested: Local OpenClaw runtime, Node 22.22.2, isolated OPENCLAW_HOME, driving the production runWithModelFallback path. A real on-disk auth-profile store was seeded with the exact reported state (blockedUntil = now + 6 days, blockedReason: "subscription_limit", blockedSource: "wham", fallbacks: []); the real auth runtime loaded it (isProfileInCooldown(openai:acct-123)=true).

Exact steps or command run after this patch: on the fixed branch, OPENCLAW_HOME=$(mktemp -d) node --import tsx repro-90702.mts; then for contrast git checkout origin/main -- src/agents/model-fallback.ts and rerun, then git checkout HEAD -- src/agents/model-fallback.ts.

Evidence after fix: console output / redacted runtime log captured from the local node OpenClaw runtime on the fixed branch — the primary is probed and serves the call:

[model-fallback/decision] decision=probe_cooldown_candidate requested=openai/gpt-4.1-mini candidate=openai/gpt-4.1-mini reason=rate_limit next=none
>>> upstream invoked: openai/gpt-4.1-mini (rolling cap recovered) -> OK
[model-fallback/decision] decision=candidate_succeeded requested=openai/gpt-4.1-mini candidate=openai/gpt-4.1-mini next=none
>>> runWithModelFallback result = "RECOVERED-OK"

Same seeded state on current main (before the change) — the primary is never called, matching the reporter's logs:

[model-fallback/decision] decision=skip_candidate requested=openai/gpt-4.1-mini candidate=openai/gpt-4.1-mini reason=rate_limit next=none detail=Provider openai is in cooldown (suspending lanes)
FallbackSummaryError: All models failed (1): openai/gpt-4.1-mini: Provider openai is in cooldown (suspending lanes) (rate_limit)

Observed result after fix: the cooldown decision flips from skip_candidate / suspend_lanes (the bug: indefinite silence and All models failed) to probe_cooldown_candidate; the primary is actually invoked and the recovered rolling cap resumes serving (RECOVERED-OK). The 30-second probe throttle is still honored, so recovery probing cannot hammer the upstream.

What was not tested: a fully live OAuth/WHAM run against a real ChatGPT subscription in a hard-capped state. The decision pipeline and auth-profile cooldown reads exercised here are the production code paths; only the auth-profile store contents were seeded to reproduce the reported block.

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 5, 2026, 1:29 PM ET / 17:29 UTC.

Summary
The PR lets a primary model with no fallback candidates re-probe during an open cooldown throttle slot and updates model-fallback regression tests for rate-limit and subscription-limit cooldowns.

PR surface: Source +6, Tests +42. Total +48 across 2 files.

Reproducibility: yes. The current-main source and latest release both show the !hasFallbackCandidates early return before the probe logic, and the PR body supplies before/after terminal output for the production fallback path with seeded auth-profile state.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P2] A fully live hard-capped ChatGPT subscription recovery cycle was not exercised; the supplied proof covers the production fallback decision path with seeded on-disk auth-profile state and redacted terminal output.

Maintainer options:

  1. Decide the mitigation before merge
    Land the narrow fallback decision change with its regression coverage after normal CI, keeping the existing per-provider 30-second throttle and multi-fallback reset-window behavior intact.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • No automated repair is needed; the patch is reviewable as-is and the remaining action is maintainer acceptance plus normal merge gates.

Security
Cleared: The diff only changes fallback cooldown control flow and tests; it does not add dependencies, workflows, secret handling, or new code-execution paths.

Review details

Best possible solution:

Land the narrow fallback decision change with its regression coverage after normal CI, keeping the existing per-provider 30-second throttle and multi-fallback reset-window behavior intact.

Do we have a high-confidence way to reproduce the issue?

Yes. The current-main source and latest release both show the !hasFallbackCandidates early return before the probe logic, and the PR body supplies before/after terminal output for the production fallback path with seeded auth-profile state.

Is this the best way to solve the issue?

Yes. Reusing the existing cooldown throttle for no-fallback primaries is narrower than changing stored provider reset timestamps or adding a config knob, and it preserves the multi-fallback preference logic.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 21aa297434e4.

Label changes

Label changes:

  • add P1: The linked bug can keep single-provider agents silent for days after a subscription cap, blocking scheduled and channel replies for real users.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal runtime output from the production fallback path with a seeded on-disk auth-profile store, plus contrasting current-main output.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

  • P1: The linked bug can keep single-provider agents silent for days after a subscription cap, blocking scheduled and channel replies for real users.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal runtime output from the production fallback path with a seeded on-disk auth-profile store, plus contrasting current-main output.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal runtime output from the production fallback path with a seeded on-disk auth-profile store, plus contrasting current-main output.
Evidence reviewed

PR surface:

Source +6, Tests +42. Total +48 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 15 9 +6
Tests 1 56 14 +42
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 71 23 +48

What I checked:

  • Current-main bug still present: On current main, shouldProbePrimaryDuringCooldown returns false when !params.hasFallbackCandidates before checking the throttle or recovery conditions, so a single-provider primary cannot re-probe while its profile has an active cooldown. (src/agents/model-fallback.ts:1063, 21aa297434e4)
  • Latest release has the same behavior: The latest release tag still has the !params.hasFallbackCandidates early return, so the central bug is not already shipped-fixed. (src/agents/model-fallback.ts:1060, 2e08f0f4221f)
  • PR merge result fixes the gate narrowly: In the PR merge result, non-primary candidates still return false, but a primary with no fallback candidates returns true after the existing probe throttle opens; multi-fallback logic still proceeds to the existing soonest-cooldown checks. (src/agents/model-fallback.ts:1063, 93a76ad466df)
  • Regression coverage covers the reported shape: The added regression case seeds a far-future subscription_limit block with fallbacks: [], expects an attempt with markProbe: true, and verifies the recent-probe throttle still suspends. (src/agents/model-fallback.probe.test.ts:363, 93a76ad466df)
  • Runtime-path coverage checks the actual fallback call: The updated runWithModelFallback case uses a single configured primary, far-future cooldown, and fallbacksOverride: [], then expects the primary run to be called with allowTransientCooldownProbe: true. (src/agents/model-fallback.probe.test.ts:727, 93a76ad466df)
  • OpenClaw auth-store contract supports the repro: OpenClaw reads WHAM usage data, stores active WHAM reset windows as blockedUntil with blockedReason: "subscription_limit", and resolves active blockedUntil windows as rate_limit, matching the linked bug's stored state. (src/agents/auth-profiles/usage.ts:82, 21aa297434e4)

Likely related people:

  • steipete: git shortlog shows the heaviest recent authorship on src/agents/model-fallback.ts and the probe tests, with recent adjacent refactors and test maintenance in this area. (role: recent area contributor; confidence: high; commits: cb5bb9b936bb, 6a87d6e81426, dcc3392a1a40; files: src/agents/model-fallback.ts, src/agents/model-fallback.probe.test.ts)
  • altaywtf: The single-provider billing cooldown probe path was introduced in commit 0669b0d, and later same-provider cooldown probe work lists the same handle as co-author/reviewer. (role: adjacent behavior owner; confidence: medium; commits: 0669b0ddc265, 048e25c2b21d; files: src/agents/model-fallback.ts, src/agents/model-fallback.probe.test.ts)
  • sebslight: The shouldProbePrimaryDuringCooldown helper was extracted in commit d224776, and the original primary cooldown recovery PR lists this handle as reviewer/co-author. (role: helper extraction and reviewer; confidence: medium; commits: d224776ffbb1, 39bb1b33222e; files: src/agents/model-fallback.ts, src/agents/model-fallback.probe.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 5, 2026
@849261680

Copy link
Copy Markdown
Contributor Author

Source-level behavior proof

This fix is a pure control-flow change in shouldProbePrimaryDuringCooldown. The bug and fix are source-reproducible per ClawSweeper review. No live environment can easily reproduce a multi-day Codex subscription cap exhaust + recovery cycle.

Before (origin/main, src/agents/model-fallback.ts:1063):

if (!params.isPrimary || !params.hasFallbackCandidates) {
  return false;
}

fallbacks:[]hasFallbackCandidates=false → returns false → primary never re-probed → agent silent for days.

After (this PR):

if (!params.isPrimary) {
  return false;
}
// Single-provider recovery probe — #90702
if (!params.hasFallbackCandidates) {
  return true;  // probe allowed when throttle slot is open
}

fallbacks:[] → returns true → primary re-probed on 30s throttle → recovers when upstream is callable.

Unchanged paths verified:

  • Non-primary (isPrimary=false): still returns false immediately.
  • Multi-fallback (hasFallbackCandidates=true): still goes through soonest / PROBE_MARGIN_MS logic unchanged.
  • Billing branch: simplified but equivalent — shouldProbe now handles single-provider recovery for all reasons.

Test coverage (14 passed):

  • re-probes a single-provider primary blocked by a far-future subscription_limit (#90702) — unit-level resolveCooldownDecision returns { type: "attempt" } not { type: "suspend_lanes" }.
  • re-probes a single-provider rate-limited primary instead of suspending — integration-level runWithModelFallback calls model run with allowTransientCooldownProbe: true.
  • 30s throttle honored: test confirms a recent probe key results in suspend_lanes.

This is a proof: override case — maintainer verification requested.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 5, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. proof: supplied External PR includes structured after-fix real behavior proof. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 5, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. P1 High-priority user-facing bug, regression, or broken workflow. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels Jun 5, 2026
@sallyom sallyom self-assigned this Jun 5, 2026

sallyom commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Land-ready maintainer review:

  • Reviewed PR 90717 for [Bug]: blockedUntil for subscription_limit set far in the future never re-probes when no fallback is configured #90702 as a narrow agent fallback/auth cooldown fix.
  • No review findings found.
  • Verified current checks are green, including Real behavior proof and agent-runtime-boundary.
  • Local proof run on PR head: node scripts/run-vitest.mjs src/agents/model-fallback.probe.test.ts.
  • Local whitespace proof: git diff --check.
  • Stale-base proof: cherry-picked 6fa95d6143dc217eea6d0db950030788098d8303 cleanly onto current origin/main (c965141d67) and reran node scripts/run-vitest.mjs src/agents/model-fallback.probe.test.ts there.
  • Dependency contract checked directly in sibling Codex source: ../codex/codex-rs/protocol/src/protocol.rs, ../codex/codex-rs/backend-client/src/client.rs, and ../codex/codex-rs/app-server/README.md for rate-limit reset/window semantics.

Known gap: I did not personally run a live OAuth/WHAM exhausted account; the PR’s seeded on-disk proof plus focused tests and CI are sufficient for this narrow merge.

@sallyom sallyom merged commit 6da3b1f into openclaw:main Jun 5, 2026
288 of 312 checks passed
frankhli843 added a commit to gemmaclaw/gemmaclaw that referenced this pull request Jun 6, 2026
Selective sync from openclaw/openclaw. Applied one self-contained runtime fix;
remaining recent high-value upstream fixes were assessed and found inapplicable
to the fork's older/refactored baseline.

- Agents/model-fallback: re-probe single-provider primary during cooldown so a
  fallbacks:[] setup (the common local-model configuration) recovers from
  rate/subscription caps without waiting for a far-future provider-reported
  reset. The existing 30s probe throttle still gates recovery probes.
  (upstream 6da3b1f, openclaw#90717, fixes openclaw#90702)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 6, 2026
…aw#90717)

Fixes openclaw#90702.

Allow a single-provider primary to periodically probe through the existing cooldown throttle even when no fallback chain is configured. This lets WHAM/subscription-limit cooldown state recover without waiting for a far-future provider reset timestamp.

Verified:
- node scripts/run-vitest.mjs src/agents/model-fallback.probe.test.ts
- git diff --check
- cherry-pick onto current origin/main and rerun focused regression
849261680 added a commit to 849261680/openclaw that referenced this pull request Jun 7, 2026
…aw#90717)

Fixes openclaw#90702.

Allow a single-provider primary to periodically probe through the existing cooldown throttle even when no fallback chain is configured. This lets WHAM/subscription-limit cooldown state recover without waiting for a far-future provider reset timestamp.

Verified:
- node scripts/run-vitest.mjs src/agents/model-fallback.probe.test.ts
- git diff --check
- cherry-pick onto current origin/main and rerun focused regression
wangmiao0668000666 pushed a commit to wangmiao0668000666/openclaw that referenced this pull request Jun 9, 2026
…aw#90717)

Fixes openclaw#90702.

Allow a single-provider primary to periodically probe through the existing cooldown throttle even when no fallback chain is configured. This lets WHAM/subscription-limit cooldown state recover without waiting for a far-future provider reset timestamp.

Verified:
- node scripts/run-vitest.mjs src/agents/model-fallback.probe.test.ts
- git diff --check
- cherry-pick onto current origin/main and rerun focused regression
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request Jun 9, 2026
…26.6.5) (#963)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.1` → `2026.6.5` |

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.6.5`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202665)

[Compare Source](openclaw/openclaw@v2026.6.1...v2026.6.5)

##### Highlights

- QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw `<thinking>` content from leaking into channel replies. ([#&#8203;89913](openclaw/openclaw#89913), [#&#8203;90132](openclaw/openclaw#90132)) Thanks [@&#8203;openperf](https://github.com/openperf).
- MCP tool results now coerce `resource_link`, `resource`, `audio`, malformed image, and future non-text/image blocks at the materialize boundary, preventing Anthropic 400s and poisoned session history after a tool returns richer MCP content. ([#&#8203;90710](openclaw/openclaw#90710), [#&#8203;90728](openclaw/openclaw#90728)) Thanks [@&#8203;RanSHammer](https://github.com/RanSHammer) and [@&#8203;849261680](https://github.com/849261680).
- Anthropic extended-thinking sessions recover after prompt-cache expiry or Gateway restart because stream start events wait for `message_start`, letting pre-generation signature errors trigger the existing recovery retry. ([#&#8203;90667](openclaw/openclaw#90667), [#&#8203;90697](openclaw/openclaw#90697)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Parallel is now a bundled `web_search` provider with `PARALLEL_API_KEY` discovery, guarded endpoint handling, cache-safe session ids, onboarding picker support, and docs. ([#&#8203;85158](openclaw/openclaw#85158)) Thanks [@&#8203;NormallyGaussian](https://github.com/NormallyGaussian).
- Google Vertex ADC users get static catalog rows and runtime model resolution again, while single-provider cooldown recovery and memory adapter status checks are more reliable. ([#&#8203;90506](openclaw/openclaw#90506), [#&#8203;90609](openclaw/openclaw#90609), [#&#8203;90717](openclaw/openclaw#90717), [#&#8203;90816](openclaw/openclaw#90816)) Thanks [@&#8203;849261680](https://github.com/849261680).
- Matrix can preflight voice notes before mention gating, preserve thread reads/replies through Matrix relations pagination, and carry QA coverage for voice and thread flows. ([#&#8203;78016](openclaw/openclaw#78016), [#&#8203;90415](openclaw/openclaw#90415))
- Auth and plugin install state is more durable: auth profiles now live in SQLite, official npm plugin install records keep their trusted pins, and prerelease fallback integrity checks avoid carrying stale integrity forward. ([#&#8203;89102](openclaw/openclaw#89102), [#&#8203;88585](openclaw/openclaw#88585))
- macOS node mode no longer silently self-reconnects away from a healthy direct Gateway session, reducing unexpected companion app session churn. ([#&#8203;90668](openclaw/openclaw#90668), [#&#8203;90815](openclaw/openclaw#90815)) Thanks [@&#8203;vrurg](https://github.com/vrurg).
- Upgrade and service paths are safer: cron legacy JSON stores migrate during doctor preflight, service env placeholders no longer mask state-dir secrets, WhatsApp startup waits are bounded, and disabled WhatsApp accounts tear down on config reload. ([#&#8203;90072](openclaw/openclaw#90072), [#&#8203;90208](openclaw/openclaw#90208), [#&#8203;90277](openclaw/openclaw#90277), [#&#8203;90488](openclaw/openclaw#90488), [#&#8203;90486](openclaw/openclaw#90486), [#&#8203;87951](openclaw/openclaw#87951), [#&#8203;87965](openclaw/openclaw#87965)) Thanks [@&#8203;MonkeyLeeT](https://github.com/MonkeyLeeT), [@&#8203;sallyom](https://github.com/sallyom), [@&#8203;mcaxtr](https://github.com/mcaxtr), and [@&#8203;MukundaKatta](https://github.com/MukundaKatta).

##### Changes

- Search/providers: add the Parallel bundled web-search plugin, live provider tests, registration contracts, onboarding/docs wiring, and guarded `api.parallel.ai/v1/search` support. ([#&#8203;85158](openclaw/openclaw#85158)) Thanks [@&#8203;NormallyGaussian](https://github.com/NormallyGaussian).
- Matrix/channels: add voice-message preflight and thread-aware read/reply behavior, including Matrix QA scenario wiring and docs for voice-message behavior. ([#&#8203;78016](openclaw/openclaw#78016), [#&#8203;90415](openclaw/openclaw#90415))
- Skills/ClawHub: install ClawHub skills backed by GitHub repositories through the resolved install API, download the pinned GitHub commit, keep install-policy checks, and report install telemetry after success. ([#&#8203;90478](openclaw/openclaw#90478)) Thanks [@&#8203;Patrick-Erichsen](https://github.com/Patrick-Erichsen).
- Google Chat/channels: add native approval card actions and click handling so Google Chat approvals use platform-native cards instead of generic message flow.
- Mobile: Android provider/model screens now surface expiring, unavailable, unresolved, and attention states more clearly, while iOS settings and Talk tabs keep diagnostics, gateway rows, attachment labels, and unavailable Talk controls reachable.
- Memory: QMD search can use the new rerank toggle, and memory adapter status uses the resolved default model identity when checking plain status. ([#&#8203;61834](openclaw/openclaw#61834))
- Docs/tooling: add Parallel search docs, refresh weather-skill guidance toward `web_fetch`, clarify legacy `openai-codex` auth, document release/test helper scripts, and tighten changed-test routing docs for CI/debugging work. ([#&#8203;90028](openclaw/openclaw#90028), [#&#8203;90250](openclaw/openclaw#90250)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Release/process: switch release trains to `YYYY.M.PATCH` monthly patch numbering, keep pre-transition tags compatible, and pin the June 2026 floor at `2026.6.5` after the published beta.
- Platform maintenance: refresh Android, Swift/macOS, Docker, CodeQL, Buildx, Docker build/push, and Codex Action dependencies for this release train. ([#&#8203;74980](openclaw/openclaw#74980), [#&#8203;81757](openclaw/openclaw#81757), [#&#8203;86481](openclaw/openclaw#86481), [#&#8203;86483](openclaw/openclaw#86483), [#&#8203;90601](openclaw/openclaw#90601))
- QQBot: add `/bot-group-allways on|off` slash command (with named-account and default-account support) to toggle whether group messages require an `@mention` before the bot replies, and clear the runtime config snapshot after the write so the new account-level `defaultRequireMention` takes effect immediately without restart. ([#&#8203;91423](openclaw/openclaw#91423)) Thanks [@&#8203;cxyhhhhh](https://github.com/cxyhhhhh).

##### Fixes

- Channel content boundaries: QQBot now strips reasoning/thinking tags before sending, preserving final answers while hiding internal model narration from users. ([#&#8203;89913](openclaw/openclaw#89913), [#&#8203;90132](openclaw/openclaw#90132)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Agents/MCP/providers: coerce non-text/image MCP tool-result blocks before they reach provider converters, preserving valid images and turning richer MCP content into text instead of malformed image blocks. ([#&#8203;90710](openclaw/openclaw#90710), [#&#8203;90728](openclaw/openclaw#90728)) Thanks [@&#8203;RanSHammer](https://github.com/RanSHammer) and [@&#8203;849261680](https://github.com/849261680).
- Anthropic/Codex/ACP/agent recovery: defer Anthropic stream start events until `message_start`, strip stale compaction thinking signatures before Anthropic replay, detect unsigned thinking-only stalls, refresh prompt fences after compaction writes, reject empty completion handoffs, preserve parent streaming-off overrides/shared progress commentary, forward heartbeat metadata to context-engine hooks, and cover Codex session/thread migration edge cases. ([#&#8203;90667](openclaw/openclaw#90667), [#&#8203;90697](openclaw/openclaw#90697), [#&#8203;90163](openclaw/openclaw#90163), [#&#8203;90108](openclaw/openclaw#90108), [#&#8203;89874](openclaw/openclaw#89874), [#&#8203;89505](openclaw/openclaw#89505), [#&#8203;90632](openclaw/openclaw#90632), [#&#8203;89302](openclaw/openclaw#89302), [#&#8203;90729](openclaw/openclaw#90729), [#&#8203;90317](openclaw/openclaw#90317), [#&#8203;90319](openclaw/openclaw#90319)) Thanks [@&#8203;openperf](https://github.com/openperf), [@&#8203;100yenadmin](https://github.com/100yenadmin), and [@&#8203;ooiuuii](https://github.com/ooiuuii).
- Provider/model resolution: preserve Google Vertex ADC auth markers in generated catalogs, re-probe a single-provider primary after cooldown, share Codex model visibility, fail closed for unknown model auth, preserve Codex alias availability, keep unresolved profile refs unknown, and avoid resolving auth while listing models. ([#&#8203;90506](openclaw/openclaw#90506), [#&#8203;90609](openclaw/openclaw#90609), [#&#8203;90717](openclaw/openclaw#90717), [#&#8203;90702](openclaw/openclaw#90702)) Thanks [@&#8203;849261680](https://github.com/849261680).
- Gateway/macOS/mobile: avoid duplicate Gateway probe warnings by identity, rate-limit node pairing requests while preserving paired-node reconnects, keep macOS node mode on a healthy direct Gateway session, keep iOS diagnostics and gateway rows reachable, and avoid Linux ARM Gradle resource tasks during Android builds. ([#&#8203;85791](openclaw/openclaw#85791), [#&#8203;90147](openclaw/openclaw#90147), [#&#8203;90668](openclaw/openclaw#90668), [#&#8203;90815](openclaw/openclaw#90815)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo) and [@&#8203;vrurg](https://github.com/vrurg).
- TUI/chat/Workboard/auto-reply: optimistic user messages stay stable across stale history reloads, runId reassignment, and abort windows instead of disappearing, jumping, or lingering as ghost rows; Workboard stale lifecycle bulk updates no longer overwrite newer status/provenance; message-tool sends now count as delivery. ([#&#8203;86205](openclaw/openclaw#86205), [#&#8203;89600](openclaw/openclaw#89600), [#&#8203;88592](openclaw/openclaw#88592), [#&#8203;90123](openclaw/openclaw#90123)) Thanks [@&#8203;RomneyDa](https://github.com/RomneyDa).
- Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. ([#&#8203;90072](openclaw/openclaw#90072), [#&#8203;90208](openclaw/openclaw#90208), [#&#8203;90277](openclaw/openclaw#90277), [#&#8203;90488](openclaw/openclaw#90488)) Thanks [@&#8203;MonkeyLeeT](https://github.com/MonkeyLeeT) and [@&#8203;sallyom](https://github.com/sallyom).
- Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. ([#&#8203;89732](openclaw/openclaw#89732), [#&#8203;90145](openclaw/openclaw#90145))
- Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. ([#&#8203;87951](openclaw/openclaw#87951), [#&#8203;87965](openclaw/openclaw#87965), [#&#8203;90486](openclaw/openclaw#90486), [#&#8203;68113](openclaw/openclaw#68113), [#&#8203;90534](openclaw/openclaw#90534), [#&#8203;90181](openclaw/openclaw#90181), [#&#8203;90607](openclaw/openclaw#90607), [#&#8203;89500](openclaw/openclaw#89500)) Thanks [@&#8203;MukundaKatta](https://github.com/MukundaKatta), [@&#8203;mcaxtr](https://github.com/mcaxtr), [@&#8203;infoanton](https://github.com/infoanton), [@&#8203;mushuiyu886](https://github.com/mushuiyu886), and [@&#8203;sahibzada-allahyar](https://github.com/sahibzada-allahyar).
- Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. ([#&#8203;89659](openclaw/openclaw#89659)) Thanks [@&#8203;ladygege](https://github.com/ladygege).
- Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. ([#&#8203;90532](openclaw/openclaw#90532), [#&#8203;90287](openclaw/openclaw#90287), [#&#8203;90058](openclaw/openclaw#90058)) Thanks [@&#8203;RomneyDa](https://github.com/RomneyDa), [@&#8203;hxy91819](https://github.com/hxy91819), and [@&#8203;mrunalp](https://github.com/mrunalp).
- Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides.
- Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements.
- Release/CI/E2E: Codex npm plugin live assertions now cap transcript discovery and diagnostic log reads so failure proof stays bounded.
- Tests/state isolation: QA Lab valid-tool-call metrics now require runtime tool-call evidence when runtime parity data is available instead of counting tool-backed scenario pass status alone.
- Tests/state isolation: QA Lab runtime parity now fails planned-only tool-call rows without matching tool results instead of treating matching mock plans as real tool evidence.
- Tests/state isolation: provider, media, auth, cron, task, session, sandbox, Gateway, and Codex timeout fixtures now scope more home/state/env data per test, reducing cross-test leakage and making release validation failures less noisy. ([#&#8203;90027](openclaw/openclaw#90027), [#&#8203;89974](openclaw/openclaw#89974))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/963
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: blockedUntil for subscription_limit set far in the future never re-probes when no fallback is configured

2 participants