fix(queue): restart drain when message enqueued after idle window by Lanfei · Pull Request #31902 · openclaw/openclaw

Lanfei · 2026-03-02T16:35:13Z

Summary

Problem: When the drain loop empties a followup queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives in that narrow window, enqueueFollowupRun creates a fresh queue object with draining: false but nothing schedules a new drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup.
Why it matters: In the common "user sends a follow-up right after the agent replies" scenario the message is silently delayed, appearing to the user as a non-response.
What changed: drain.ts now maintains a FOLLOWUP_RUN_CALLBACKS map that persists the most recent runFollowup callback per queue key. scheduleFollowupDrain writes the callback on every call; enqueueFollowupRun calls kickFollowupDrainIfIdle after pushing an item if !queue.draining, restarting the drain immediately using the cached callback. clearSessionQueues cleans up the callback cache alongside the queue state.
What did NOT change: The normal drain execution path, deduplication, drop policy, cap, and all queue modes (collect / steer / followup) are untouched.

Change Type

Bug fix

Scope

Gateway / orchestration

Linked Issue/PR

Closes #

User-visible / Behavior Changes

Messages that arrive during the idle window immediately after an agent reply are now processed without delay, instead of waiting for the next run to trigger.

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Steps

Send a message and let the agent reply (drain loop runs to completion, queue key is deleted)
Immediately send another message before the next run callback fires
Observe whether the second message is processed

Expected

Second message is processed and a reply is delivered promptly.

Actual (before fix)

Second message is silently held until the next run completes and calls finalizeWithFollowup.

Evidence

Three new unit tests that reproduce the exact timing and verify the fix:
- processes a message enqueued after the drain empties and deletes the queue — core race condition
- does not double-drain when a message arrives while drain is still running — regression guard against duplicate drains
- does not process messages after clearSessionQueues clears the callback — zombie-drain guard after session teardown

✓ followup queue drain restart after idle window > processes a message enqueued after the drain empties and deletes the queue
✓ followup queue drain restart after idle window > does not double-drain when a message arrives while drain is still running
✓ followup queue drain restart after idle window > does not process messages after clearSessionQueues clears the callback

Human Verification

Verified scenarios: all three test cases pass locally; full reply-flow.test.ts suite (47 tests) passes
Edge cases checked: no duplicate drain triggered when draining: true; callback correctly cleared after clearSessionQueues
What you did not verify: behaviour under high-concurrency multi-session production load

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery

How to revert: git revert this commit; no external state is affected
Known bad symptoms: if this introduces a regression, watch for duplicate followup deliveries; drain errors will appear in logs as repeated followup queue drain failed for <key> entries

Risks and Mitigations

Risk: FOLLOWUP_RUN_CALLBACKS retains a callback reference until clearSessionQueues is called; an abnormal session exit that skips cleanup would leave a stale entry
- Mitigation: lifecycle matches the existing FOLLOWUP_QUEUES map; all normal session-end paths go through clearSessionQueues

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87baf4d74a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/auto-reply/reply/queue/drain.ts

greptile-apps · 2026-03-02T16:47:02Z

Greptile Summary

This PR fixes a race condition where messages enqueued immediately after a drain completes would be stranded until the next run triggers. The fix introduces a callback cache (FOLLOWUP_RUN_CALLBACKS) that persists the drain callback even after the queue is deleted, allowing enqueueFollowupRun to automatically restart the drain when a message arrives during the post-drain idle window.

Key changes:

drain.ts: Added FOLLOWUP_RUN_CALLBACKS map to persist drain callbacks, new kickFollowupDrainIfIdle function to restart idle drains
enqueue.ts: After enqueueing, automatically kicks drain if !queue.draining
cleanup.ts: Clears callback cache alongside queue cleanup
Comprehensive test coverage (3 new tests) validates the fix and guards against regressions

The implementation is clean and minimal. The existing draining flag prevents duplicate drains in all edge cases. Normal drain execution, deduplication, and queue modes remain untouched.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The fix is well-designed with minimal, focused changes. Three comprehensive tests validate the fix and guard against double-drain and zombie-drain regressions. The implementation leverages the existing draining flag to prevent race conditions. Callback lifecycle matches existing queue lifecycle. No security, compatibility, or logical issues found.
No files require special attention

_{Last reviewed commit: 87baf4d}

After a drain loop empties the queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives at that moment enqueueFollowupRun creates a fresh queue object with draining:false but never starts a drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup. Fix: persist the most recent runFollowup callback per queue key in FOLLOWUP_RUN_CALLBACKS (drain.ts). enqueueFollowupRun now calls kickFollowupDrainIfIdle after a successful push; if a cached callback exists and no drain is running it calls scheduleFollowupDrain to restart immediately. clearSessionQueues cleans up the callback cache alongside the queue state.

…anfei

…anfei)

steipete · 2026-03-02T19:38:17Z

Landed via temp rebase onto main.

Gate: pnpm -s vitest run src/auto-reply/reply/reply-flow.test.ts -t "followup queue drain restart after idle window"
Land commit: cd75f83
Merge commit: b645654

Thanks @Lanfei!

…anfei

…anfei)

@afurm

* fix(plugins): fallback install entrypoints for legacy manifests * Voice Call: enforce exact webhook path match * Tests: isolate webhook path suite and reset cron auth state * chore: keep #31930 scoped to voice webhook path fix * fix: add changelog for exact voice webhook path match (#31930) (thanks @afurm) * fix: handle HTTP 529 (Anthropic overloaded) in failover error classification Classify Anthropic's 529 status code as "rate_limit" so model fallback triggers reliably without depending on fragile message-based detection. Closes #28502 * fix: add changelog for HTTP 529 failover classification (#31854) (thanks @bugkill3r) * fix(slack): guard against undefined text in includes calls during mention handling * fix: add changelog for mentions/slack null-safe guards (#31865) (thanks @stone-jin) * fix(memory-lancedb): pass dimensions to embedding API call - Add dimensions parameter to Embeddings constructor - Pass dimensions to OpenAI embeddings.create() API call - Fixes dimension mismatch when using custom embedding models like DashScope text-embedding-v4 * fix: add regression for memory-lancedb dimensions pass-through (#32036) (thanks @scotthuang) * fix(telegram): guard malformed native menu specs * fix: harden plugin command registration + telegram menu guard (#31997) (thanks @liuxiaopai-ai) * fix(gateway): restart heartbeat on model config changes * fix: add changelog credit for heartbeat model reload (#32046) (thanks @stakeswky) * test(process): replace no-output timer subprocess with spawn mock * test(perf): trim repeated setup in cron memory and config suites * test(perf): reduce per-case setup in script and git-hook tests * fix(slack): scope debounce key by message timestamp to prevent cross-thread collisions Top-level channel messages from the same sender shared a bare channel debounce key, causing concurrent messages in different threads to merge into a single reply on the wrong thread. Now the debounce key includes the message timestamp for top-level messages, matching how the downstream session layer already scopes by canonicalThreadId. Extracted buildSlackDebounceKey() for testability. Closes #31935 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden slack debounce key routing and ordering (#31951) (thanks @scoootscooob) * fix(openrouter): skip reasoning.effort injection for x-ai/grok models x-ai/grok models on OpenRouter do not support the reasoning.effort parameter and reject payloads containing it with "Invalid arguments passed to the model." Skip reasoning injection for these models, the same way we already skip it for the dynamic "auto" routing model. Closes #32039 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for openrouter x-ai reasoning guard (#32054) (thanks @scoootscooob) * fix(agents): scope volcengine-plan/byteplus-plan auth lookup to profile resolution The configure flow stores auth credentials under `provider: "volcengine"`, but the coding model uses `volcengine-plan` as its provider. Add a scoped `normalizeProviderIdForAuth` function used only by `listProfilesForProvider` so coding-plan variants resolve to their base provider for auth credential lookup without affecting global provider routing. Closes #31731 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tools): honor fsPolicy.workspaceOnly in image/pdf tool localRoots PR #28822 fixed the Write/Edit tools to respect `tools.fs.workspaceOnly`, but the image and PDF tools still unconditionally include default local roots (`~/.openclaw/media`, `~/.openclaw/agents`, etc.) when computing the `localRoots` allowlist for non-sandbox mode. When `fsPolicy.workspaceOnly` is true, restrict `localRoots` to only the workspace directory so that files outside the workspace are rejected by `assertLocalMediaAllowed()`. Relates to #31716 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for fsPolicy image/pdf propagation (#31882) (thanks @justinhuangcode) * fix: skip Telegram command sync when menu is unchanged (#32017) Hash the command list and cache it to disk per account. On restart, compare the current hash against the cached one and skip the deleteMyCommands + setMyCommands round-trip when nothing changed. This prevents 429 rate-limit errors when the gateway restarts several times in quick succession. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(telegram): scope command-sync hash cache by bot identity (#32059) * fix: normalize coding-plan providers in auth order validation * feat(security): Harden Docker browser container chromium flags (#23889) (#31504) * Gateway: honor OPENCLAW_GATEWAY_URL override for remote/local calls * Agents: fix sandbox sessionKey usage for PI embedded subagent calls * Sandbox: tighten browser container Chromium runtime flags * fix: add sandbox browser defaults for container hardening * docs: expand sandbox browser default flags list * fix: make sandbox browser flags optional and preserve gateway env auth overrides * docs: scope PR 31504 changelog entry * style: format gateway call override handling * fix: dedupe sandbox browser chrome args * fix: preserve remote tls fingerprint for env gateway override * fix: enforce auth for env gateway URL override * chore: document gateway override auth security expectations * fix(delivery): strip HTML tags for plain-text messaging surfaces Models occasionally produce HTML tags in their output. While these render fine on web surfaces, they appear as literal text on WhatsApp, Signal, SMS, IRC, and Telegram. Add sanitizeForPlainText() utility that converts common inline HTML to lightweight-markup equivalents and strips remaining tags. Applied in the outbound delivery pipeline for non-HTML surfaces only. Closes #31884 See also: #18558 * fix(outbound): harden plain-text HTML sanitization paths (#32034) * fix(security): harden file installs and race-path tests * matrix: bootstrap crypto runtime when npm scripts are skipped * fix(matrix): keep plugin register sync while bootstrapping crypto runtime (#31989) * perf(runtime): reduce cron persistence and logger overhead * test(perf): use prebuilt plugin install archive fixtures * test(perf): increase guardrail scan read concurrency * fix(queue): restart drain when message enqueued after idle window After a drain loop empties the queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives at that moment enqueueFollowupRun creates a fresh queue object with draining:false but never starts a drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup. Fix: persist the most recent runFollowup callback per queue key in FOLLOWUP_RUN_CALLBACKS (drain.ts). enqueueFollowupRun now calls kickFollowupDrainIfIdle after a successful push; if a cached callback exists and no drain is running it calls scheduleFollowupDrain to restart immediately. clearSessionQueues cleans up the callback cache alongside the queue state. * fix: avoid stale followup drain callbacks (#31902) (thanks @Lanfei) * fix(synology-chat): read cfg from outbound context so incomingUrl resolves * fix: require openclaw.extensions for plugin installs (#32055) (thanks @liuxiaopai-ai) --------- Co-authored-by: Andrii Furmanets <furmanets.andriy@gmail.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Saurabh <skmishra1991@gmail.com> Co-authored-by: stone-jin <1520006273@qq.com> Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: User <user@example.com> Co-authored-by: scoootscooob <zhentongfan@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: justinhuangcode <justinhuangcode@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: AytuncYildizli <cryptosquanch@gmail.com> Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com> Co-authored-by: Jealous <CooLanfei@163.com> Co-authored-by: white-rm <zhang.xujin@xydigit.com>

…anfei

…anfei)

@afurm

* fix(plugins): fallback install entrypoints for legacy manifests * Voice Call: enforce exact webhook path match * Tests: isolate webhook path suite and reset cron auth state * chore: keep openclaw#31930 scoped to voice webhook path fix * fix: add changelog for exact voice webhook path match (openclaw#31930) (thanks @afurm) * fix: handle HTTP 529 (Anthropic overloaded) in failover error classification Classify Anthropic's 529 status code as "rate_limit" so model fallback triggers reliably without depending on fragile message-based detection. Closes openclaw#28502 * fix: add changelog for HTTP 529 failover classification (openclaw#31854) (thanks @bugkill3r) * fix(slack): guard against undefined text in includes calls during mention handling * fix: add changelog for mentions/slack null-safe guards (openclaw#31865) (thanks @stone-jin) * fix(memory-lancedb): pass dimensions to embedding API call - Add dimensions parameter to Embeddings constructor - Pass dimensions to OpenAI embeddings.create() API call - Fixes dimension mismatch when using custom embedding models like DashScope text-embedding-v4 * fix: add regression for memory-lancedb dimensions pass-through (openclaw#32036) (thanks @scotthuang) * fix(telegram): guard malformed native menu specs * fix: harden plugin command registration + telegram menu guard (openclaw#31997) (thanks @liuxiaopai-ai) * fix(gateway): restart heartbeat on model config changes * fix: add changelog credit for heartbeat model reload (openclaw#32046) (thanks @stakeswky) * test(process): replace no-output timer subprocess with spawn mock * test(perf): trim repeated setup in cron memory and config suites * test(perf): reduce per-case setup in script and git-hook tests * fix(slack): scope debounce key by message timestamp to prevent cross-thread collisions Top-level channel messages from the same sender shared a bare channel debounce key, causing concurrent messages in different threads to merge into a single reply on the wrong thread. Now the debounce key includes the message timestamp for top-level messages, matching how the downstream session layer already scopes by canonicalThreadId. Extracted buildSlackDebounceKey() for testability. Closes openclaw#31935 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden slack debounce key routing and ordering (openclaw#31951) (thanks @scoootscooob) * fix(openrouter): skip reasoning.effort injection for x-ai/grok models x-ai/grok models on OpenRouter do not support the reasoning.effort parameter and reject payloads containing it with "Invalid arguments passed to the model." Skip reasoning injection for these models, the same way we already skip it for the dynamic "auto" routing model. Closes openclaw#32039 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for openrouter x-ai reasoning guard (openclaw#32054) (thanks @scoootscooob) * fix(agents): scope volcengine-plan/byteplus-plan auth lookup to profile resolution The configure flow stores auth credentials under `provider: "volcengine"`, but the coding model uses `volcengine-plan` as its provider. Add a scoped `normalizeProviderIdForAuth` function used only by `listProfilesForProvider` so coding-plan variants resolve to their base provider for auth credential lookup without affecting global provider routing. Closes openclaw#31731 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tools): honor fsPolicy.workspaceOnly in image/pdf tool localRoots PR openclaw#28822 fixed the Write/Edit tools to respect `tools.fs.workspaceOnly`, but the image and PDF tools still unconditionally include default local roots (`~/.openclaw/media`, `~/.openclaw/agents`, etc.) when computing the `localRoots` allowlist for non-sandbox mode. When `fsPolicy.workspaceOnly` is true, restrict `localRoots` to only the workspace directory so that files outside the workspace are rejected by `assertLocalMediaAllowed()`. Relates to openclaw#31716 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for fsPolicy image/pdf propagation (openclaw#31882) (thanks @justinhuangcode) * fix: skip Telegram command sync when menu is unchanged (openclaw#32017) Hash the command list and cache it to disk per account. On restart, compare the current hash against the cached one and skip the deleteMyCommands + setMyCommands round-trip when nothing changed. This prevents 429 rate-limit errors when the gateway restarts several times in quick succession. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(telegram): scope command-sync hash cache by bot identity (openclaw#32059) * fix: normalize coding-plan providers in auth order validation * feat(security): Harden Docker browser container chromium flags (openclaw#23889) (openclaw#31504) * Gateway: honor OPENCLAW_GATEWAY_URL override for remote/local calls * Agents: fix sandbox sessionKey usage for PI embedded subagent calls * Sandbox: tighten browser container Chromium runtime flags * fix: add sandbox browser defaults for container hardening * docs: expand sandbox browser default flags list * fix: make sandbox browser flags optional and preserve gateway env auth overrides * docs: scope PR 31504 changelog entry * style: format gateway call override handling * fix: dedupe sandbox browser chrome args * fix: preserve remote tls fingerprint for env gateway override * fix: enforce auth for env gateway URL override * chore: document gateway override auth security expectations * fix(delivery): strip HTML tags for plain-text messaging surfaces Models occasionally produce HTML tags in their output. While these render fine on web surfaces, they appear as literal text on WhatsApp, Signal, SMS, IRC, and Telegram. Add sanitizeForPlainText() utility that converts common inline HTML to lightweight-markup equivalents and strips remaining tags. Applied in the outbound delivery pipeline for non-HTML surfaces only. Closes openclaw#31884 See also: openclaw#18558 * fix(outbound): harden plain-text HTML sanitization paths (openclaw#32034) * fix(security): harden file installs and race-path tests * matrix: bootstrap crypto runtime when npm scripts are skipped * fix(matrix): keep plugin register sync while bootstrapping crypto runtime (openclaw#31989) * perf(runtime): reduce cron persistence and logger overhead * test(perf): use prebuilt plugin install archive fixtures * test(perf): increase guardrail scan read concurrency * fix(queue): restart drain when message enqueued after idle window After a drain loop empties the queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives at that moment enqueueFollowupRun creates a fresh queue object with draining:false but never starts a drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup. Fix: persist the most recent runFollowup callback per queue key in FOLLOWUP_RUN_CALLBACKS (drain.ts). enqueueFollowupRun now calls kickFollowupDrainIfIdle after a successful push; if a cached callback exists and no drain is running it calls scheduleFollowupDrain to restart immediately. clearSessionQueues cleans up the callback cache alongside the queue state. * fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @Lanfei) * fix(synology-chat): read cfg from outbound context so incomingUrl resolves * fix: require openclaw.extensions for plugin installs (openclaw#32055) (thanks @liuxiaopai-ai) --------- Co-authored-by: Andrii Furmanets <furmanets.andriy@gmail.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Saurabh <skmishra1991@gmail.com> Co-authored-by: stone-jin <1520006273@qq.com> Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: User <user@example.com> Co-authored-by: scoootscooob <zhentongfan@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: justinhuangcode <justinhuangcode@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: AytuncYildizli <cryptosquanch@gmail.com> Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com> Co-authored-by: Jealous <CooLanfei@163.com> Co-authored-by: white-rm <zhang.xujin@xydigit.com>

…anfei

…anfei)

@afurm

* fix(plugins): fallback install entrypoints for legacy manifests * Voice Call: enforce exact webhook path match * Tests: isolate webhook path suite and reset cron auth state * chore: keep openclaw#31930 scoped to voice webhook path fix * fix: add changelog for exact voice webhook path match (openclaw#31930) (thanks @afurm) * fix: handle HTTP 529 (Anthropic overloaded) in failover error classification Classify Anthropic's 529 status code as "rate_limit" so model fallback triggers reliably without depending on fragile message-based detection. Closes openclaw#28502 * fix: add changelog for HTTP 529 failover classification (openclaw#31854) (thanks @bugkill3r) * fix(slack): guard against undefined text in includes calls during mention handling * fix: add changelog for mentions/slack null-safe guards (openclaw#31865) (thanks @stone-jin) * fix(memory-lancedb): pass dimensions to embedding API call - Add dimensions parameter to Embeddings constructor - Pass dimensions to OpenAI embeddings.create() API call - Fixes dimension mismatch when using custom embedding models like DashScope text-embedding-v4 * fix: add regression for memory-lancedb dimensions pass-through (openclaw#32036) (thanks @scotthuang) * fix(telegram): guard malformed native menu specs * fix: harden plugin command registration + telegram menu guard (openclaw#31997) (thanks @liuxiaopai-ai) * fix(gateway): restart heartbeat on model config changes * fix: add changelog credit for heartbeat model reload (openclaw#32046) (thanks @stakeswky) * test(process): replace no-output timer subprocess with spawn mock * test(perf): trim repeated setup in cron memory and config suites * test(perf): reduce per-case setup in script and git-hook tests * fix(slack): scope debounce key by message timestamp to prevent cross-thread collisions Top-level channel messages from the same sender shared a bare channel debounce key, causing concurrent messages in different threads to merge into a single reply on the wrong thread. Now the debounce key includes the message timestamp for top-level messages, matching how the downstream session layer already scopes by canonicalThreadId. Extracted buildSlackDebounceKey() for testability. Closes openclaw#31935 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden slack debounce key routing and ordering (openclaw#31951) (thanks @scoootscooob) * fix(openrouter): skip reasoning.effort injection for x-ai/grok models x-ai/grok models on OpenRouter do not support the reasoning.effort parameter and reject payloads containing it with "Invalid arguments passed to the model." Skip reasoning injection for these models, the same way we already skip it for the dynamic "auto" routing model. Closes openclaw#32039 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for openrouter x-ai reasoning guard (openclaw#32054) (thanks @scoootscooob) * fix(agents): scope volcengine-plan/byteplus-plan auth lookup to profile resolution The configure flow stores auth credentials under `provider: "volcengine"`, but the coding model uses `volcengine-plan` as its provider. Add a scoped `normalizeProviderIdForAuth` function used only by `listProfilesForProvider` so coding-plan variants resolve to their base provider for auth credential lookup without affecting global provider routing. Closes openclaw#31731 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tools): honor fsPolicy.workspaceOnly in image/pdf tool localRoots PR openclaw#28822 fixed the Write/Edit tools to respect `tools.fs.workspaceOnly`, but the image and PDF tools still unconditionally include default local roots (`~/.openclaw/media`, `~/.openclaw/agents`, etc.) when computing the `localRoots` allowlist for non-sandbox mode. When `fsPolicy.workspaceOnly` is true, restrict `localRoots` to only the workspace directory so that files outside the workspace are rejected by `assertLocalMediaAllowed()`. Relates to openclaw#31716 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for fsPolicy image/pdf propagation (openclaw#31882) (thanks @justinhuangcode) * fix: skip Telegram command sync when menu is unchanged (openclaw#32017) Hash the command list and cache it to disk per account. On restart, compare the current hash against the cached one and skip the deleteMyCommands + setMyCommands round-trip when nothing changed. This prevents 429 rate-limit errors when the gateway restarts several times in quick succession. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(telegram): scope command-sync hash cache by bot identity (openclaw#32059) * fix: normalize coding-plan providers in auth order validation * feat(security): Harden Docker browser container chromium flags (openclaw#23889) (openclaw#31504) * Gateway: honor OPENCLAW_GATEWAY_URL override for remote/local calls * Agents: fix sandbox sessionKey usage for PI embedded subagent calls * Sandbox: tighten browser container Chromium runtime flags * fix: add sandbox browser defaults for container hardening * docs: expand sandbox browser default flags list * fix: make sandbox browser flags optional and preserve gateway env auth overrides * docs: scope PR 31504 changelog entry * style: format gateway call override handling * fix: dedupe sandbox browser chrome args * fix: preserve remote tls fingerprint for env gateway override * fix: enforce auth for env gateway URL override * chore: document gateway override auth security expectations * fix(delivery): strip HTML tags for plain-text messaging surfaces Models occasionally produce HTML tags in their output. While these render fine on web surfaces, they appear as literal text on WhatsApp, Signal, SMS, IRC, and Telegram. Add sanitizeForPlainText() utility that converts common inline HTML to lightweight-markup equivalents and strips remaining tags. Applied in the outbound delivery pipeline for non-HTML surfaces only. Closes openclaw#31884 See also: openclaw#18558 * fix(outbound): harden plain-text HTML sanitization paths (openclaw#32034) * fix(security): harden file installs and race-path tests * matrix: bootstrap crypto runtime when npm scripts are skipped * fix(matrix): keep plugin register sync while bootstrapping crypto runtime (openclaw#31989) * perf(runtime): reduce cron persistence and logger overhead * test(perf): use prebuilt plugin install archive fixtures * test(perf): increase guardrail scan read concurrency * fix(queue): restart drain when message enqueued after idle window After a drain loop empties the queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives at that moment enqueueFollowupRun creates a fresh queue object with draining:false but never starts a drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup. Fix: persist the most recent runFollowup callback per queue key in FOLLOWUP_RUN_CALLBACKS (drain.ts). enqueueFollowupRun now calls kickFollowupDrainIfIdle after a successful push; if a cached callback exists and no drain is running it calls scheduleFollowupDrain to restart immediately. clearSessionQueues cleans up the callback cache alongside the queue state. * fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @Lanfei) * fix(synology-chat): read cfg from outbound context so incomingUrl resolves * fix: require openclaw.extensions for plugin installs (openclaw#32055) (thanks @liuxiaopai-ai) --------- Co-authored-by: Andrii Furmanets <furmanets.andriy@gmail.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Saurabh <skmishra1991@gmail.com> Co-authored-by: stone-jin <1520006273@qq.com> Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: User <user@example.com> Co-authored-by: scoootscooob <zhentongfan@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: justinhuangcode <justinhuangcode@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: AytuncYildizli <cryptosquanch@gmail.com> Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com> Co-authored-by: Jealous <CooLanfei@163.com> Co-authored-by: white-rm <zhang.xujin@xydigit.com>

…anfei

…anfei)

@afurm

* fix(plugins): fallback install entrypoints for legacy manifests * Voice Call: enforce exact webhook path match * Tests: isolate webhook path suite and reset cron auth state * chore: keep openclaw#31930 scoped to voice webhook path fix * fix: add changelog for exact voice webhook path match (openclaw#31930) (thanks @afurm) * fix: handle HTTP 529 (Anthropic overloaded) in failover error classification Classify Anthropic's 529 status code as "rate_limit" so model fallback triggers reliably without depending on fragile message-based detection. Closes openclaw#28502 * fix: add changelog for HTTP 529 failover classification (openclaw#31854) (thanks @bugkill3r) * fix(slack): guard against undefined text in includes calls during mention handling * fix: add changelog for mentions/slack null-safe guards (openclaw#31865) (thanks @stone-jin) * fix(memory-lancedb): pass dimensions to embedding API call - Add dimensions parameter to Embeddings constructor - Pass dimensions to OpenAI embeddings.create() API call - Fixes dimension mismatch when using custom embedding models like DashScope text-embedding-v4 * fix: add regression for memory-lancedb dimensions pass-through (openclaw#32036) (thanks @scotthuang) * fix(telegram): guard malformed native menu specs * fix: harden plugin command registration + telegram menu guard (openclaw#31997) (thanks @liuxiaopai-ai) * fix(gateway): restart heartbeat on model config changes * fix: add changelog credit for heartbeat model reload (openclaw#32046) (thanks @stakeswky) * test(process): replace no-output timer subprocess with spawn mock * test(perf): trim repeated setup in cron memory and config suites * test(perf): reduce per-case setup in script and git-hook tests * fix(slack): scope debounce key by message timestamp to prevent cross-thread collisions Top-level channel messages from the same sender shared a bare channel debounce key, causing concurrent messages in different threads to merge into a single reply on the wrong thread. Now the debounce key includes the message timestamp for top-level messages, matching how the downstream session layer already scopes by canonicalThreadId. Extracted buildSlackDebounceKey() for testability. Closes openclaw#31935 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden slack debounce key routing and ordering (openclaw#31951) (thanks @scoootscooob) * fix(openrouter): skip reasoning.effort injection for x-ai/grok models x-ai/grok models on OpenRouter do not support the reasoning.effort parameter and reject payloads containing it with "Invalid arguments passed to the model." Skip reasoning injection for these models, the same way we already skip it for the dynamic "auto" routing model. Closes openclaw#32039 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for openrouter x-ai reasoning guard (openclaw#32054) (thanks @scoootscooob) * fix(agents): scope volcengine-plan/byteplus-plan auth lookup to profile resolution The configure flow stores auth credentials under `provider: "volcengine"`, but the coding model uses `volcengine-plan` as its provider. Add a scoped `normalizeProviderIdForAuth` function used only by `listProfilesForProvider` so coding-plan variants resolve to their base provider for auth credential lookup without affecting global provider routing. Closes openclaw#31731 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tools): honor fsPolicy.workspaceOnly in image/pdf tool localRoots PR openclaw#28822 fixed the Write/Edit tools to respect `tools.fs.workspaceOnly`, but the image and PDF tools still unconditionally include default local roots (`~/.openclaw/media`, `~/.openclaw/agents`, etc.) when computing the `localRoots` allowlist for non-sandbox mode. When `fsPolicy.workspaceOnly` is true, restrict `localRoots` to only the workspace directory so that files outside the workspace are rejected by `assertLocalMediaAllowed()`. Relates to openclaw#31716 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add changelog credit for fsPolicy image/pdf propagation (openclaw#31882) (thanks @justinhuangcode) * fix: skip Telegram command sync when menu is unchanged (openclaw#32017) Hash the command list and cache it to disk per account. On restart, compare the current hash against the cached one and skip the deleteMyCommands + setMyCommands round-trip when nothing changed. This prevents 429 rate-limit errors when the gateway restarts several times in quick succession. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(telegram): scope command-sync hash cache by bot identity (openclaw#32059) * fix: normalize coding-plan providers in auth order validation * feat(security): Harden Docker browser container chromium flags (openclaw#23889) (openclaw#31504) * Gateway: honor OPENCLAW_GATEWAY_URL override for remote/local calls * Agents: fix sandbox sessionKey usage for PI embedded subagent calls * Sandbox: tighten browser container Chromium runtime flags * fix: add sandbox browser defaults for container hardening * docs: expand sandbox browser default flags list * fix: make sandbox browser flags optional and preserve gateway env auth overrides * docs: scope PR 31504 changelog entry * style: format gateway call override handling * fix: dedupe sandbox browser chrome args * fix: preserve remote tls fingerprint for env gateway override * fix: enforce auth for env gateway URL override * chore: document gateway override auth security expectations * fix(delivery): strip HTML tags for plain-text messaging surfaces Models occasionally produce HTML tags in their output. While these render fine on web surfaces, they appear as literal text on WhatsApp, Signal, SMS, IRC, and Telegram. Add sanitizeForPlainText() utility that converts common inline HTML to lightweight-markup equivalents and strips remaining tags. Applied in the outbound delivery pipeline for non-HTML surfaces only. Closes openclaw#31884 See also: openclaw#18558 * fix(outbound): harden plain-text HTML sanitization paths (openclaw#32034) * fix(security): harden file installs and race-path tests * matrix: bootstrap crypto runtime when npm scripts are skipped * fix(matrix): keep plugin register sync while bootstrapping crypto runtime (openclaw#31989) * perf(runtime): reduce cron persistence and logger overhead * test(perf): use prebuilt plugin install archive fixtures * test(perf): increase guardrail scan read concurrency * fix(queue): restart drain when message enqueued after idle window After a drain loop empties the queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives at that moment enqueueFollowupRun creates a fresh queue object with draining:false but never starts a drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup. Fix: persist the most recent runFollowup callback per queue key in FOLLOWUP_RUN_CALLBACKS (drain.ts). enqueueFollowupRun now calls kickFollowupDrainIfIdle after a successful push; if a cached callback exists and no drain is running it calls scheduleFollowupDrain to restart immediately. clearSessionQueues cleans up the callback cache alongside the queue state. * fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @Lanfei) * fix(synology-chat): read cfg from outbound context so incomingUrl resolves * fix: require openclaw.extensions for plugin installs (openclaw#32055) (thanks @liuxiaopai-ai) --------- Co-authored-by: Andrii Furmanets <furmanets.andriy@gmail.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Saurabh <skmishra1991@gmail.com> Co-authored-by: stone-jin <1520006273@qq.com> Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: User <user@example.com> Co-authored-by: scoootscooob <zhentongfan@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: justinhuangcode <justinhuangcode@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: AytuncYildizli <cryptosquanch@gmail.com> Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com> Co-authored-by: Jealous <CooLanfei@163.com> Co-authored-by: white-rm <zhang.xujin@xydigit.com>

…anfei

…anfei)

@afurm

* fix(plugins): fallback install entrypoints for legacy manifests * Voice Call: enforce exact webhook path match * Tests: isolate webhook path suite and reset cron auth state * chore: keep openclaw#31930 scoped to voice webhook path fix * fix: add changelog for exact voice webhook path match (openclaw#31930) (thanks @afurm) * fix: handle HTTP 529 (Anthropic overloaded) in failover error classification Classify Anthropic's 529 status code as "rate_limit" so model fallback triggers reliably without depending on fragile message-based detection. Closes openclaw#28502 * fix: add changelog for HTTP 529 failover classification (openclaw#31854) (thanks @bugkill3r) * fix(slack): guard against undefined text in includes calls during mention handling * fix: add changelog for mentions/slack null-safe guards (openclaw#31865) (thanks @stone-jin) * fix(memory-lancedb): pass dimensions to embedding API call - Add dimensions parameter to Embeddings constructor - Pass dimensions to OpenAI embeddings.create() API call - Fixes dimension mismatch when using custom embedding models like DashScope text-embedding-v4 * fix: add regression for memory-lancedb dimensions pass-through (openclaw#32036) (thanks @scotthuang) * fix(telegram): guard malformed native menu specs * fix: harden plugin command registration + telegram menu guard (openclaw#31997) (thanks @liuxiaopai-ai) * fix(gateway): restart heartbeat on model config changes * fix: add changelog credit for heartbeat model reload (openclaw#32046) (thanks @stakeswky) * test(process): replace no-output timer subprocess with spawn mock * test(perf): trim repeated setup in cron memory and config suites * test(perf): reduce per-case setup in script and git-hook tests * fix(slack): scope debounce key by message timestamp to prevent cross-thread collisions Top-level channel messages from the same sender shared a bare channel debounce key, causing concurrent messages in different threads to merge into a single reply on the wrong thread. Now the debounce key includes the message timestamp for top-level messages, matching how the downstream session layer already scopes by canonicalThreadId. Extracted buildSlackDebounceKey() for testability. Closes openclaw#31935 * fix: harden slack debounce key routing and ordering (openclaw#31951) (thanks @scoootscooob) * fix(openrouter): skip reasoning.effort injection for x-ai/grok models x-ai/grok models on OpenRouter do not support the reasoning.effort parameter and reject payloads containing it with "Invalid arguments passed to the model." Skip reasoning injection for these models, the same way we already skip it for the dynamic "auto" routing model. Closes openclaw#32039 * fix: add changelog credit for openrouter x-ai reasoning guard (openclaw#32054) (thanks @scoootscooob) * fix(agents): scope volcengine-plan/byteplus-plan auth lookup to profile resolution The configure flow stores auth credentials under `provider: "volcengine"`, but the coding model uses `volcengine-plan` as its provider. Add a scoped `normalizeProviderIdForAuth` function used only by `listProfilesForProvider` so coding-plan variants resolve to their base provider for auth credential lookup without affecting global provider routing. Closes openclaw#31731 * fix(tools): honor fsPolicy.workspaceOnly in image/pdf tool localRoots PR openclaw#28822 fixed the Write/Edit tools to respect `tools.fs.workspaceOnly`, but the image and PDF tools still unconditionally include default local roots (`~/.openclaw/media`, `~/.openclaw/agents`, etc.) when computing the `localRoots` allowlist for non-sandbox mode. When `fsPolicy.workspaceOnly` is true, restrict `localRoots` to only the workspace directory so that files outside the workspace are rejected by `assertLocalMediaAllowed()`. Relates to openclaw#31716 * fix: add changelog credit for fsPolicy image/pdf propagation (openclaw#31882) (thanks @justinhuangcode) * fix: skip Telegram command sync when menu is unchanged (openclaw#32017) Hash the command list and cache it to disk per account. On restart, compare the current hash against the cached one and skip the deleteMyCommands + setMyCommands round-trip when nothing changed. This prevents 429 rate-limit errors when the gateway restarts several times in quick succession. * fix(telegram): scope command-sync hash cache by bot identity (openclaw#32059) * fix: normalize coding-plan providers in auth order validation * feat(security): Harden Docker browser container chromium flags (openclaw#23889) (openclaw#31504) * Gateway: honor OPENCLAW_GATEWAY_URL override for remote/local calls * Agents: fix sandbox sessionKey usage for PI embedded subagent calls * Sandbox: tighten browser container Chromium runtime flags * fix: add sandbox browser defaults for container hardening * docs: expand sandbox browser default flags list * fix: make sandbox browser flags optional and preserve gateway env auth overrides * docs: scope PR 31504 changelog entry * style: format gateway call override handling * fix: dedupe sandbox browser chrome args * fix: preserve remote tls fingerprint for env gateway override * fix: enforce auth for env gateway URL override * chore: document gateway override auth security expectations * fix(delivery): strip HTML tags for plain-text messaging surfaces Models occasionally produce HTML tags in their output. While these render fine on web surfaces, they appear as literal text on WhatsApp, Signal, SMS, IRC, and Telegram. Add sanitizeForPlainText() utility that converts common inline HTML to lightweight-markup equivalents and strips remaining tags. Applied in the outbound delivery pipeline for non-HTML surfaces only. Closes openclaw#31884 See also: openclaw#18558 * fix(outbound): harden plain-text HTML sanitization paths (openclaw#32034) * fix(security): harden file installs and race-path tests * matrix: bootstrap crypto runtime when npm scripts are skipped * fix(matrix): keep plugin register sync while bootstrapping crypto runtime (openclaw#31989) * perf(runtime): reduce cron persistence and logger overhead * test(perf): use prebuilt plugin install archive fixtures * test(perf): increase guardrail scan read concurrency * fix(queue): restart drain when message enqueued after idle window After a drain loop empties the queue it deletes the key from FOLLOWUP_QUEUES. If a new message arrives at that moment enqueueFollowupRun creates a fresh queue object with draining:false but never starts a drain, leaving the message stranded until the next run completes and calls finalizeWithFollowup. Fix: persist the most recent runFollowup callback per queue key in FOLLOWUP_RUN_CALLBACKS (drain.ts). enqueueFollowupRun now calls kickFollowupDrainIfIdle after a successful push; if a cached callback exists and no drain is running it calls scheduleFollowupDrain to restart immediately. clearSessionQueues cleans up the callback cache alongside the queue state. * fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @Lanfei) * fix(synology-chat): read cfg from outbound context so incomingUrl resolves * fix: require openclaw.extensions for plugin installs (openclaw#32055) (thanks @liuxiaopai-ai) --------- Co-authored-by: Andrii Furmanets <furmanets.andriy@gmail.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Saurabh <skmishra1991@gmail.com> Co-authored-by: stone-jin <1520006273@qq.com> Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: User <user@example.com> Co-authored-by: scoootscooob <zhentongfan@gmail.com> Co-authored-by: justinhuangcode <justinhuangcode@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: AytuncYildizli <cryptosquanch@gmail.com> Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com> Co-authored-by: Jealous <CooLanfei@163.com> Co-authored-by: white-rm <zhang.xujin@xydigit.com>

openclaw-barnacle bot added the size: S label Mar 2, 2026

chatgpt-codex-connector bot reviewed Mar 2, 2026

View reviewed changes

src/auto-reply/reply/queue/drain.ts Outdated Show resolved Hide resolved

Lanfei and others added 2 commits March 2, 2026 19:33

fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @L…

cd75f83

…anfei)

steipete force-pushed the fix/queue-drain-idle-window branch from 87baf4d to cd75f83 Compare March 2, 2026 19:37

steipete merged commit b645654 into openclaw:main Mar 2, 2026
9 checks passed

steipete added a commit to liuxiaopai-ai/openclaw that referenced this pull request Mar 2, 2026

fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @L…

4c76a55

…anfei)

execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026

fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @L…

b9b3a98

…anfei)

github-actions bot mentioned this pull request Mar 2, 2026

📡 Upstream Digest — 2026-03-02 20:27 UTC curtismercier/openclaw-mods#166

Open

Lanfei deleted the fix/queue-drain-idle-window branch March 3, 2026 03:45

dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026

fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @L…

c195d2a

…anfei)

OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026

fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @L…

6e9273e

…anfei)

zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026

fix: avoid stale followup drain callbacks (openclaw#31902) (thanks @L…

26842f1

…anfei)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(queue): restart drain when message enqueued after idle window#31902

fix(queue): restart drain when message enqueued after idle window#31902
steipete merged 2 commits intoopenclaw:mainfrom
Lanfei:fix/queue-drain-idle-window

Lanfei commented Mar 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

greptile-apps bot commented Mar 2, 2026

Uh oh!

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Lanfei commented Mar 2, 2026

Summary

Change Type

Scope

Linked Issue/PR

User-visible / Behavior Changes

Security Impact

Repro + Verification

Steps

Expected

Actual (before fix)

Evidence

Human Verification

Compatibility / Migration

Failure Recovery

Risks and Mitigations

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants