fix(agents): bound session event-queue wait to release lock on wedged turns#1
fix(agents): bound session event-queue wait to release lock on wedged turns#1matin wants to merge 246 commits into
Conversation
…84846) Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com>
Route normal [telegram][diag] polling diagnostics through runtime.log while keeping non-diag Telegram warnings/errors and offset persistence failures on runtime.error. Verification: - node scripts/run-vitest.mjs extensions/telegram/src/monitor.test.ts (34 passed) - git diff --check - CI run 26378692736 passed on 979c6f3 Fixes openclaw#82957
Stabilize WebChat transcript/run-state truth for Codex and selected-session observers. Summary: - Mirror Codex inbound prompts at turn start without duplicating suppressed persisted prompts. - Deliver hidden external-channel live chat/tool/agent updates only to exact selected-session subscribers. - Repair Control UI selected-session subscription state, alias-aware run adoption, and accumulated stream dedupe. - Add focused Codex, gateway/session-event, and Control UI regression coverage. Verification: - Current-head CI: 101 green, 0 pending; stale canceled entries are superseded automation from prior force-pushed heads. - Local focused Vitest shards passed: Codex app-server 2 files / 233 tests, gateway/session 4 files / 116 tests, UI 7 files / 238 tests. - `node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo` - `node --import tsx scripts/check-no-extension-test-core-imports.ts` - `git diff --check origin/main..HEAD` Closes openclaw#83528. Closes openclaw#82611. Refs openclaw#83949.
…cross-process file lock (openclaw#86326) Summary: - The PR adds a commitments-store writer helper, wraps load-modify-save mutators and expiry cleanup with a per-path queue plus `withFileLock`, adds three concurrency regressions, and updates the changelog. - PR surface: Source +153, Tests +61, Docs +1. Total +215 across 4 files. - Reproducibility: yes. Source inspection on current main shows the unqueued load-modify-save mutation path, a ... inked proof log shows the Promise.all repro changing from 20/20 lost writes before the patch to 0/20 after. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(commitments): serialize load-modify-save with in-process queue + … Validation: - ClawSweeper review passed for head a349f41. - Required merge gates passed before the squash merge. Prepared head SHA: a349f41 Review: openclaw#86326 (comment) Co-authored-by: ai-hpc <mail.speedy.hpc@hotmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Fixes openclaw#12831. Adds a Control UI Activity tab at `/activity` under the Control nav group. The tab derives browser-local, memory-only activity entries from the existing `session.tool` / tool-event delivery path and stores only sanitized summaries, hidden-argument counts, and redacted/truncated output previews. Includes filtering, tool selection, clear, expand/collapse, keyboard-native disclosure rows, auto-follow scrolling, navigation/i18n/docs/changelog coverage, and focused regression tests. Follow-up tracks openclaw#54577, openclaw#37816, and openclaw#47386 remain distinct and open. Verification: - `pnpm ui:i18n:sync` - `git diff --check` - Focused Vitest coverage for Activity, gateway/tool stream, chat item rendering, navigation, and gateway agent events - Desktop/mobile browser smoke for sanitized Activity rendering and header de-duplication - Testbox `pnpm check:changed`: `tbx_01ksen33c79b8rywayf6cxww4r` Thanks @BunsDev.
Fix isolated cron delivery so agent-default derivation keeps using the paired runtime config snapshot, preserving resolved channel credentials such as Discord SecretRefs. Fixes openclaw#86545.
* refactor: share talk event metric extraction * refactor: reuse shared coercion helpers * refactor: reuse shared primitive guards * refactor: reuse shared record guard * refactor: reuse shared primitive helpers * refactor: reuse shared string guards * refactor: reuse shared non-empty string guard * refactor: share plugin primitive coercion helpers * refactor: reuse plugin coercion helpers * refactor: reuse plugin coercion helpers in more plugins * refactor: reuse channel coercion helpers * refactor: reuse monitor coercion helpers * refactor: reuse provider coercion helpers * refactor: reuse core coercion helpers * refactor: reuse runtime coercion helpers * refactor: reuse helper coercion in codex paths * refactor: reuse helper coercion in runtime paths * refactor: reuse codex app-server coercion helpers * refactor: reuse codex record helpers * refactor: reuse migration and qa record helpers * refactor: reuse feishu and core helper guards * refactor: reuse browser and policy coercion helpers * refactor: reuse memory wiki record helper * refactor: share boolean coercion helpers * refactor: reuse finite number coercion * refactor: reuse trimmed string list helpers * refactor: reuse string list normalization * refactor: reuse remaining string list helpers * refactor: reuse string entry normalizer * refactor: share sorted string helpers * refactor: share string list normalization * test: preserve command registry browser imports * refactor: reuse trimmed list helpers * refactor: reuse string dedupe helpers * refactor: reuse local dedupe helpers * refactor: reuse more string dedupe helpers * refactor: reuse command string dedupe helpers * refactor: dedupe memory path lists with helper * refactor: expose string dedupe helpers to plugins * refactor: reuse core string dedupe helpers * refactor: reuse shared unique value helpers * refactor: reuse unique helpers in agent utilities * refactor: reuse unique helpers in config plumbing * refactor: reuse unique helpers in extensions * refactor: reuse unique helpers in core utilities * refactor: reuse unique helpers in qa plugins * refactor: reuse unique helpers in memory plugins * refactor: reuse unique helpers in channel plugins * refactor: reuse unique helpers in core tails * refactor: reuse unique helper in comfy workflow * refactor: reuse unique helpers in test utilities * refactor: expose unique value helper to plugins * refactor: reuse unique helpers for numeric lists * refactor: replace index dedupe filters * refactor: reuse string entry normalization * refactor: reuse string normalization in plugin helpers * refactor: reuse string normalization in extension helpers * refactor: reuse string normalization in channel parsers * refactor: reuse string normalization in memory search * refactor: reuse string normalization in provider parsers * refactor: reuse string normalization in qa helpers * refactor: reuse string normalization in infra parsers * refactor: reuse string normalization in messaging parsers * refactor: reuse string normalization in core parsers * refactor: reuse string normalization in extension parsers * refactor: reuse string normalization in remaining parsers * refactor: reuse string normalization in final parser spots * refactor: reuse string normalization in qa media helpers * refactor: reuse normalization in provider and media lists * refactor: reuse normalization for remaining set filters * refactor: reuse normalization in policy allowlists * refactor: reuse normalization in session and owner lists * refactor: centralize primitive string lists * refactor: reuse lowercase entry helpers * refactor: reuse sorted string helpers * refactor: reuse unique trimmed helpers * refactor: reuse string normalization helpers * refactor: reuse catalog string helpers * refactor: reuse remaining string helpers * refactor: simplify remaining list normalization * refactor: reuse codex auth order normalization * chore: refresh plugin sdk api baseline * fix: make shared string sorting deterministic * chore: refresh plugin sdk api baseline * fix: align host env security ordering
Precompute FIR resample kernels for common voice sample-rate conversions to avoid per-sample trigonometry while preserving output for tested ratios.\n\nVerification: node scripts/run-vitest.mjs extensions/voice-call/src/telephony-audio.test.ts; pnpm tsgo:core; autoreview --mode commit --commit HEAD; PR CI green.
Adds regression coverage for agents.defaults.agentRuntime schema acceptance and invalid-config doctor fix reachability. The runtime behavior fix already landed on main in 5b9be2c; this PR locks the expected behavior with focused tests. Closes openclaw#72872
Fix Gemini cached-content GenerateContent payloads so cached requests no longer resend request-level systemInstruction, tools, or toolConfig. Covers explicit cachedContent and managed cacheRetention prompt caching; fixes openclaw#84919. Proof: Real behavior proof passed on PR head 198a42b after live Gemini repro/fix evidence was added to the PR body. Focused tests and check:changed were already green. Thanks @neeravmakwana.
Keep isolated cron announce delivery owned by runner fallback while leaving agent-initiated message sends optional. `delivery.mode: none` no longer forces message delivery, announce delivery skips fallback only after a verified same-target message-tool send, and prompt allowlist checks now match runtime tool policy normalization/group expansion. Verified with focused cron tests, `check:changed`, autoreview, and PR CI on 7ab77ba. Thanks @bryanpearson. Co-authored-by: bryanpearson <bryanmpearson@gmail.com>
* build: refresh dependencies * build: align pi fallback version
…6408) * fix(openai): route compaction through codex auth provider Co-authored-by: VACInc <3279061+VACInc@users.noreply.github.com> * fix(openai): honor default responses compaction threshold Co-authored-by: VACInc <3279061+VACInc@users.noreply.github.com> * fix(openai): preserve codex runtime routing * docs(changelog): note Codex routing fix --------- Co-authored-by: Merlin <258679497+funmerlin@users.noreply.github.com> Co-authored-by: VACInc <3279061+VACInc@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
… (thanks @neeravmakwana) Behavior addressed: Telegram direct-message turns no longer drop an earlier overlapping normal reply, while authorized aborts and explicit/native/plugin/skill command turns still supersede active reply work. Real environment tested: local OpenClaw focused Telegram test shard plus existing contributor Telegram screenshot/log proof in the PR body. Exact steps or command run after this patch: pnpm test extensions/telegram/src/telegram-reply-fence.test.ts extensions/telegram/src/bot-message-dispatch.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 93 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: overlapping normal Telegram DMs use non-interrupting reply fences and both final replies remain deliverable; direct /stop, authorized built-in commands, and explicit text/native command turns still supersede. What was not tested: fresh live Telegram Desktop rerun by this agent; PR retains contributor screenshot/log proof and the Real behavior proof bot remains red despite proof labels. Thanks @neeravmakwana. Co-authored-by: Neerav Makwana <261249544+neeravmakwana@users.noreply.github.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
Reduce hot-path cache churn by reusing the active plugin metadata snapshot for manifest model-id normalization when safe, and by avoiding repeated JSON reparses for cached session stores while preserving clone semantics. Verification: - pnpm exec oxfmt --check src/plugins/manifest-model-id-normalization.ts src/plugins/manifest-model-id-normalization.test.ts src/config/sessions/store-cache.ts src/config/sessions.cache.test.ts - node scripts/run-vitest.mjs src/config/sessions.cache.test.ts src/plugins/manifest-model-id-normalization.test.ts src/gateway/session-utils.subagent.test.ts - pnpm tsgo:core - autoreview clean - PR CI green
…aw#80613) (openclaw#86645) Summary: - The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases. - PR surface: Source +15, Tests +96. Total +111 across 5 files. - Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix - PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz… - PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests… - PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613) Validation: - ClawSweeper review passed for head ca9c027. - Required merge gates passed before the squash merge. Prepared head SHA: ca9c027 Review: openclaw#86645 (comment) Co-authored-by: MoerAI <friendnt@g.skku.edu> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
… turns A turn interrupted via sessions_yield can leave _agentEventQueue pending indefinitely. waitForSessionEventQueue sits on every session-write-lock release path (the in-loop waits and the cleanup-time acquireForCleanup), so an unbounded wait pinned the lock until the ~17min maxHoldMs watchdog. That orphaned the lock and dead-locked the background media completion-wake, which times out acquiring at 60s -> "completion delivery failed after successful generation" and a ~17min wedged session. Bound the wait (OPENCLAW_SESSION_EVENT_QUEUE_WAIT_TIMEOUT_MS, default 5s; 250ms under OPENCLAW_TEST_FAST) so the lock always releases within a bounded window instead of waiting on the watchdog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dependency Changes DetectedThis PR changes dependency-related files. Maintainers should confirm these changes are intentional. Changed files:
Maintainer follow-up:
|
|
Closing record (not a mystery): this PR is superseded by upstream openclaw Both fix the same openclaw#195 orphan — a
The two are complementary, not contradictory: bounding the helper is a reasonable defense in its own right. Upstream just makes it unnecessary for the openclaw#195 case by removing the lock-hold entirely. The bound remains applied on the VM as the interim fix (sha |
…penclaw#76262) * fix(msteams): rebase SDK migration onto current main Reapply the msteams SDK migration (originally on feat/msteams-sdk-migration) on top of upstream/main, resolving conflicts with parallel msteams work that landed upstream during our session. What got applied vs decisions made: CLEANLY APPLIED (3-way patch): - monitor.ts, monitor-handler.ts, polls.ts, reply-stream-controller.ts/.test.ts, reply-dispatcher.ts, attachments/download.ts, monitor.lifecycle.test.ts, monitor-handler/message-handler.ts, monitor-handler.types.ts, etc. - streaming-message.ts + .test.ts deletions WHOLESALE TAKE FROM ORIGINAL BRANCH (partial 3-way left broken cross-refs): - sdk.ts, sdk.test.ts, messenger.ts, feedback-reflection.ts, send-context.ts, send.test.ts KEPT UPSTREAM (deferred for separate cleanup): - extensions/msteams/package.json (still has jsonwebtoken/jwks-rsa per Peter's b3bc60a incremental approach) - src/plugins/contracts/package-manifest.contract.test.ts (consistent with package.json) - pnpm-lock.yaml (avoids lockfile churn; pnpm install --frozen-lockfile clean) ADAPTED: - Dockerfile matrix-sdk-crypto check now wraps upstream's new retry-loop in the if-matrix-bundled gate KNOWN TEST FAILURES (need eyes — see PR comment): - attachments.test.ts: 1 fail (pre-existing — warn meta arg shape changed in our migration but test wasn't updated) - reply-dispatcher.test.ts: 6 fails (pre-existing — tests mock old TeamsHttpStream, not updated for our ctx.stream rewrite) - send.test.ts: 4 fails (NEW from merge — upstream's send.ts changed media loading; our mocks need updating or take upstream's send.test.ts wholesale) UPSTREAM COMMITS POTENTIALLY MISSED (in wholesale-take files): - 08c4af0 fix(msteams): accept conversation id allowlists - e1840b8 fix(msteams): bind global audience tokens to app id - Channels turn-kernel refactor (ffe67e9 / 1ead1b2 / 9a9cd0c) — may be partially preserved in cleanly-patched files Static checks pass: pnpm check:changed is green (typecheck, lint, contract tests, import cycles, etc.). Manual testing required before merge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(msteams): preserve thread routing for channel and group-chat replies - monitor.ts: adaptSdkContext now uses ctx.reply() for channel and groupChat conversations (so the SDK threads outbound activities to the inbound's replyToId/serviceUrl) and ctx.send() only for personal DMs (where reply()'s blockquote-prepend is ugly). - messenger.ts: sendProactively passes resolvedThreadId on the non-thread fallback path so channel @mentions that fall through outbound.ts -> send.ts still land in the original thread instead of top-level. Live-validated: channel @mention -> bot replies in thread, threaded reply -> bot replies in same thread, no top-level leakage. * fix(msteams): tag outbound SDK calls with OpenClaw User-Agent - user-agent.ts: add buildOpenClawUserAgentFragment() that returns just 'OpenClaw/<version>'. The SDK's Client.clone merges this with its own 'teams.ts[apps]/<sdk-version>' identifier — passing the full buildUserAgent() here would double-print the SDK token. - sdk.ts: pass the fragment via AppOptions.client.headers['User-Agent'] so the Teams backend can identify OpenClaw traffic for usage telemetry. Final UA looks like 'OpenClaw/<openclaw-version> teams.ts[apps]/<sdk-version>'. * fix(msteams): handle StreamCancelledError when user presses Stop mid-stream The new SDK throws StreamCancelledError synchronously from stream.emit/update when the user pressed Stop in Teams: Teams replies 403 to the next chunk update, the SDK flips _canceled, and any subsequent emit() throws. The old custom TeamsHttpStream either swallowed cancel or didn't expose this exception type, so the migration inherited an SDK behavior the original code didn't have to handle. Symptom on 2026-05-05: pressing Stop during a streaming reply caused an unhandled promise rejection that crashed the Node 24 process. Docker restarted the gateway about two minutes after each Stop click. Two related bugs surfaced once the crash was caught: the would-be block fallback re-delivered the full text as a second message (duplicate after Stop), and the typing-keepalive kept pulsing in Teams for the rest of the agent run because nothing told it to stop. reply-stream-controller.ts: - Wrap stream.update / stream.emit / stream.close in try/catch that swallows StreamCancelledError (matched by .name to dodge tsgo's SDK re-export resolution quirk). Latch a wasCanceled flag so subsequent calls short-circuit even if stream.canceled is stale. - preparePayload() returns undefined when the stream was canceled — the streamed prefix is already visible to the user, so dropping the payload prevents a duplicate block message from overriding the cancel intent. reply-dispatcher.ts: - Typing-keepalive gate now also checks streamController.wasCanceled() so typing pulses stop firing once Stop is observed. Otherwise the bot keeps pulsing for the rest of the (uncancellable) agent run. reply-stream-controller.test.ts: - 6 new regression tests cover: cancel-during-emit (the crash scenario), cancel-during-update, cancel-during-finalize, non-cancel error propagation, post-cancel inactivity, and dropped-payload-on-cancel. Live-validated: long streaming reply + Stop mid-stream -> stream freezes, no duplicate message, no zombie typing, container stays healthy. * fix(msteams): allow Bearer-token retry on Skype CDN attachment downloads Teams puts inline DM images and clipboard-pasted images on *.asm.skype.com URLs (e.g. us-api.asm.skype.com/v1/objects/<id>/views/imgo). The download path in attachments/download.ts already does a plain GET first and falls back to a Bearer-token retry on 401/403 — but the retry was gated on the URL being in DEFAULT_MEDIA_AUTH_HOST_ALLOWLIST. asm.skype.com hosts were in DEFAULT_MEDIA_HOST_ALLOWLIST (download permitted) but not in the auth-host list, so a 401 plain-GET response skipped the retry and surfaced as a missing image to the agent. Add asm.skype.com and ams.skype.com to the auth allowlist so openclaw attempts the Bearer-token retry consistently, matching how it treats the other CDN/Bot-Framework hosts already in the list. Note: this does not unblock all clipboard-pasted DM images — for at least some tenants asm.skype.com rejects the Bot Framework token (returns 401 even with auth). Routing those URLs through <serviceUrl>/v3/attachments/... the way openclaw#62219 already handles HTML-wrapped attachments is a separate follow-up. The +button 'Upload from this device' path works today because Teams generates an attachment with an HTML wrapper that triggers the existing BF v3 attachments fallback in monitor-handler/inbound-media.ts. * fix(msteams): align docker-compose msteams port default with plugin default The plugin defaults webhook.port to 3978 (the Bot Framework standard used in Microsoft samples) and listens on whatever the operator sets there. The docker-compose.yml port mapping was exposing ${OPENCLAW_MSTEAMS_PORT:-3000}:3000 which only works for operators who explicitly set webhook.port to 3000. Default-config users would have the plugin listening on 3978 inside the container while compose forwarded 3000, causing connection refused. Realign to ${OPENCLAW_MSTEAMS_PORT:-3978}:3978 so a default-config docker compose up Just Works with Teams. Operators wanting a custom port override both webhook.port in openclaw.json and OPENCLAW_MSTEAMS_PORT env var. * fix(msteams): post-rebase reconciliation with main Three follow-ups after rebasing the SDK migration onto current main: - reply-dispatcher.ts: rename createChannelReplyPipeline to its post-rebase identifier createChannelMessageReplyPipeline (the plugin-sdk barrel renamed it during the 1454-commit rebase window). - reply-dispatcher.ts: tighten the typing-keepalive onStartError signature to (err: unknown) to satisfy upstream's stricter type checks. - messenger.ts: drop the unconditional thread suffix on the bottom proactive fallback. The previous behavior threaded all top-level proactive sends when the stored ref had a threadId, which contradicts replyStyle='top-level' semantics (and breaks the new upstream test). Threading on the proactive path is preserved where it matters — the onRevoked branch within replyStyle==='thread' still passes resolvedThreadId, which is the original openclaw#55198 fix path. - attachments.test.ts: update the warn-call assertion to match the migration's inline message format (host=... error=...) — the structured meta object was being dropped by the logger formatter pre-migration. * feat(msteams): port streaming preview/progress features to ctx.stream While the SDK migration was open, upstream landed preview/progress/draft streaming features built on the OLD custom TeamsHttpStream class (which the migration deletes). This commit ports the user-visible parts of those features onto the new ctx.stream substrate so the migration doesn't lose ground: - pickInformativeStatusText: reads custom labels from msteams.streaming.progressDraft config via resolveChannelProgressDraftLabel. Falls back to the plugin-sdk default rotation. Pre-rebase used a hardcoded 4-string array. - streamMode resolution: "partial" (default, per-token streaming), "progress" (no tokens; preview card carries informative label that updates as tools run), or "block" (no native streaming). Mode is read from cfg.channels.msteams.streaming.preview. - progress-draft gate: createChannelProgressDraftGate gates informative updates so the rotating label only starts firing once meaningful work has begun (avoids flicker before the first tool call). - noteProgressWork() / pushProgressLine(): public methods on the controller for callers (typing keepalive ticks, tool-event callbacks) to signal work. pushProgressLine appends tool names as bullets above the rotating label when streaming.previewToolProgress is enabled. Wiring these into actual tool events is a separate follow-up. - preparePayload progress-mode path: when stream is active but no tokens streamed (progress mode) and a final text payload arrives, emit the text into the stream so the preview card transitions in place to the final reply on close(). reply-dispatcher: pass log + msteamsConfig + a stable progressSeed (${accountId}:${conversation.id}) to createTeamsReplyStreamController so the informative-label rotation is consistent across reconnects. What's NOT ported and why: - Live-edit-via-replaceInformativeWithFinal: the SDK's HttpStream natively accumulates emitted text + entities + channelData and flushes ONE final activity at close() using the same activity id as the preview. So the separate "replace informative with final" call from upstream is unnecessary — we get live-finalization for free via the SDK's design. - pushProgressLine triggers from tool events: needs reply-pipeline-side callbacks the new SDK migration didn't surface yet. Follow-up. Tests: existing 22 reply-stream-controller tests still pass (the new behaviors are additive). * feat(msteams): wire pipeline tool events to streaming progress + fix test debt Two follow-ups from yesterday's stopping point: 1. Wire pipeline events into the stream controller's progress-draft surface. reply-dispatcher's replyOptions now exposes onReasoningStream, onToolStart, onItemEvent, onPlanUpdate, onApprovalEvent, onCommandOutput callbacks that format each event via the channel-streaming helpers and route through streamController.pushProgressLine(). Mirrors the discord adapter's wiring. Also: - resolveChannelStreamingPreviewToolProgress + ...SuppressDefaultTool... so the dispatcher exposes suppressDefaultToolProgressMessages on its replyOptions when progress mode is on. - Switch disableBlockStreaming resolution to the channel-streaming helpers (resolveChannelPreviewStreamMode + resolveChannelStreamingBlockEnabled) so streaming.mode='block' and streaming.block.enabled=true are honored alongside the legacy blockStreaming boolean. 2. Fix the test debt that the rebase exposed: - reply-dispatcher.test.ts: drop the streamInstances + TeamsHttpStream mock pattern (file deleted by migration); replace with a streamMock provided via context.stream that mirrors the SDK's IStreamer shape (update/emit/close/canceled). Update assertions on sendInformativeUpdate -> stream.update, stream.update -> stream.emit. Drop the resumes-typing-between-segments test (no equivalent in the new ctx.stream model — the SDK's HttpStream doesn't have a 'between segments' notion; close ends the stream). - send.test.ts: fix two stale mock targets — loadOutboundMediaFromUrl comes from openclaw/plugin-sdk/outbound-media (not /msteams), and resolveMarkdownTableMode comes from openclaw/plugin-sdk/markdown-table-runtime (not /config-runtime). The previous mock paths were no-ops post-migration. All 854 msteams tests now pass (was 17 failing in 4 files yesterday). * fix(msteams): SDK streaming delta + use app.reply for proactive thread sends Two narrow regressions exposed by the @microsoft/teams.apps migration: - The SDK's HttpStream.emit appends each chunk to its internal buffer (`this.text += activity.text`), but the channel reply pipeline emits cumulative text on each chunk. Forwarding cumulative text into an appending sink produced "chunk1 + chunk1chunk2 + chunk1chunk2chunk3..." duplication for streamed (DM) replies. Track the emitted prefix length in the stream controller and only forward the new tail. - Replace the manual `${convId};messageid=${msgId}` URL construction in the proactive thread fallback with `app.reply()`, which builds the threaded conversation id via the SDK's own toThreadedConversationId helper. Mechanically equivalent today; removes coupling to Teams' URL format and tracks any future SDK changes. Also adds the `reply` method to the structural MSTeamsApp type so the refactor typechecks without casts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(msteams): bump @microsoft/teams.api and teams.apps to 2.0.10 2.0.10 adds support for the AAD v1 token issuer that the Bot Framework JWT validator needs. The minor version bump pulls teams.cards / common / graph along to 2.0.10 too. Add `@microsoft/teams.*` to `minimumReleaseAgeExclude` in pnpm-workspace.yaml because 2.0.10 was published <48h ago and the default `minimumReleaseAge: 2880` (~2 days) would otherwise reject it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * revert(msteams): remove asm.skype.com auth-host allowlist additions These hosts were added in dfc169d for inline DM image auth-retry, but the commit's own footnote acknowledges it doesn't actually unblock clipboard-pasted images (asm.skype.com rejects Bot Framework tokens in at least some tenants). The change is unrelated to the SDK migration and the user-visible bug it claimed to fix isn't fixed; lifting it out keeps this PR focused on the migration. Will land as a separate PR if the auth-allowlist consistency improvement is wanted on its own. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(msteams): typed ExpressAdapter helper, drop unknown-cast pyramid The monitor's SDK bootstrap had an awkward chain: httpServerAdapter: new ( (await import("@microsoft/teams.apps")) as unknown as { ExpressAdapter: new (app: unknown) => unknown; } ).ExpressAdapter(expressApp) as never, Three casts (`unknown`, structural shape literal, `never`) were a defensive workaround from when the SDK's hashed d.ts files tripped up tsgo. With the SDK's exports now resolving cleanly, the same import can be done with full types. - Extend the lazy `loadSdkModules()` cache to include `ExpressAdapter` alongside `App` so the dynamic import is shared. - Add `createMSTeamsExpressAdapter(serverOrApp)` helper in `sdk.ts` that encapsulates the lazy import and returns a properly-typed adapter instance. - Replace `httpServerAdapter`'s structural shape on `CreateMSTeamsAppOptions` with the SDK's own `IHttpServerAdapter` interface (re-exported from `@microsoft/teams.apps`). The call site in `monitor.ts` becomes a single typed call with no `any`, no `unknown`, no `as never`. The lazy-load behavior is preserved: nothing imports `@microsoft/teams.apps` at module load time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(msteams): unbreak tsgo:extensions on the ExpressAdapter helper CI's check-prod-types failed because the previous commit's typed helper used `typeof import("@microsoft/teams.apps").ExpressAdapter`, which tsc/tsgo's NodeNext resolution can't follow through the SDK's chained `export *` barrel: @microsoft/teams.apps/dist/index.d.ts: export * from "./http"; // folder with index.d.ts export * from "./app"; // single .d.ts file The folder re-export drops `ExpressAdapter` and `IHttpServerAdapter` from the namespace shape under `tsconfig.extensions.json` (passes under the per-extension `tsconfig.json` because of inherited `paths`). Same root cause as why we already model `MSTeamsApp` structurally (line 47 comment). Switch the ExpressAdapter side to the same structural-shape pattern: - Define `MSTeamsHttpServerAdapter` and `MSTeamsExpressAdapterCtor` locally. - Cast `m.ExpressAdapter` once inside `loadSdkModules` (the runtime export is fine; only the type surface is hidden). - `httpServerAdapter` on `CreateMSTeamsAppOptions` and the return type of `createMSTeamsExpressAdapter` use the local structural type. Net result: the call site in `monitor.ts` stays the cast-free single line the previous commit landed; the one remaining cast is confined to the SDK-loading helper with an explanatory comment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(msteams): drop unused jsonwebtoken/jwks-rsa deps The SDK migration removed all `import "jsonwebtoken"` / `import "jwks-rsa"` from source code (the SDK does JWT validation internally now), but the package.json entries and the matching `package-manifest.contract.test.ts` expectation were left orphaned. Drop both: - `extensions/msteams/package.json`: remove `jsonwebtoken` (^9), `jwks-rsa` (^4) from `dependencies` and `@types/jsonwebtoken` from `devDependencies`. - `src/plugins/contracts/package-manifest.contract.test.ts`: remove the two entries from msteams's `pluginLocalRuntimeDeps` expectation. - `monitor.lifecycle.test.ts`: extend the `./sdk.js` mock with the `createMSTeamsExpressAdapter` export added in the typed-helper cleanup, so the lifecycle suite still mounts after the deps drop. Lockfile regenerates accordingly. All msteams tests (865) pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(msteams): drop unused @microsoft/teams.api direct dep CI's deadcode:dependencies (knip) flagged @microsoft/teams.api as unused in extensions/msteams. The plugin source uses structural type aliases (MSTeamsActivityParams, MSTeamsActivityLike, etc.) to dodge tsgo resolution bugs with teams.api's hashed d.ts files, so it never imports teams.api directly. The package is brought in transitively via @microsoft/teams.apps; the only other reference is probe.test.ts's vi.mock("@microsoft/teams.api"), which works on the import-path string and doesn't require a direct dep declaration. Lockfile regenerates accordingly. tsgo:extensions, knip, and all 865 msteams tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(msteams): clear three CI gate failures (lint, contract, deprecated config API) Three CI checks flagged on the latest run; all three are msteams-local and unrelated to one another: - **check-lint** / **check-additional-extension-bundled**: `oxlint` flagged a redundant `as string[]` assertion in `reply-dispatcher.ts:431`. The preceding `every((s: unknown) => typeof s === "string")` already narrows the array type, so the cast does nothing. Drop it. - **checks-fast-contracts-plugins-c**: the `package-manifest.contract.test.ts` `pluginLocalRuntimeDeps` for msteams still expected `@microsoft/teams.api`, but the deadcode cleanup commit (8f4050f) dropped it from `extensions/msteams/package.json`. Remove it from the contract test too — `teams.api` is only present transitively via `teams.apps`, which is the reason knip flagged it. - **check-additional-runtime-topology-architecture**: the deprecated internal config API guard caught `messenger.ts:223` calling `getMSTeamsRuntime().config.loadConfig()`. Switch to `config.current()` to match the pattern used by phone-control, synology-chat, and matrix. Pre-existing failures on this run that are NOT msteams-related and not caused by this PR: `check-test-types` (errors in `src/agents/openai-transport-stream.test.ts` and `pi-embedded-runner/openai-stream-wrappers.test.ts`) and `macos-swift` (`hoistAwait` in `MacNodeRuntime.swift`). Leaving those for upstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(msteams): cast config.current() return to OpenClawConfig The previous commit switched `messenger.ts:223` from the deprecated `config.loadConfig()` to `config.current()` to satisfy the architecture guard, but `config.current()` returns a deeply-readonly type that's not assignable to the `Partial<OpenClawConfig>` parameter `resolveMarkdownTableMode` expects (a mutable type from the SDK contract). Phone-control, synology-chat, and matrix all cast at this seam — adopt the same pattern. Verified locally: tsgo:core, tsgo:extensions, check:architecture, and test:extensions:package-boundary:compile all pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(msteams): address PR review — pre-auth body limit, allowlist log level, /api/messages forwarder, narrow release-age exclude Four narrow fixes from the PR review (BradGroux + clawsweeper bot + galiniliev's plan), each its own concern: - **pre-auth-body-limit** (monitor.ts) — install `express.json({ limit: DEFAULT_WEBHOOK_MAX_BODY_BYTES })` before the bearer-presence gate and SDK route. Express memoizes the parsed body on the request, so the SDK's later `json()` becomes a no-op and our limit applies before any handler parses bodies. Closes the gap where a `Bearer garbage`-shaped attacker could force unbounded JSON parsing before token validation. - **allowlist-error-logging** (monitor.ts) — restore main's `runtime.error` level for the `msteams resolve failed` catch (was downgraded to `runtime.log` mid-merge). Graph allowlist resolution failures are security-relevant; they need to surface to operators. - **legacy-messages-route** (monitor.ts) — when `webhook.path` is set to a custom value, also accept POSTs on the legacy `/api/messages` path with a one-time deprecation warning, then re-enter the Express middleware chain on the configured path. Keeps existing Azure Bot registrations working through the transition. Cast-free (`expressApp(req, res, next)` works because `Application extends IRouter extends RequestHandler`). - **release-age-scope** (pnpm-workspace.yaml) — narrow `@microsoft/teams.*` glob to the single direct dep `@microsoft/teams.apps`. Future scoped packages no longer get a freshness-guard pass. Tests + checks: msteams suite (867), tsgo:core, tsgo:extensions, tsgo:test, lint:extensions, check:architecture, knip --dependencies, package-manifest contract, all green. Still pending from the review (separate commits): - auth-coverage-tests (Brad #1 + comment) — tests proving the SDK accepts `aud=<bot app id>` and rejects `aud=api.botframework.com`. - invoke-response-handling (Brad #2, codex P2) — file-consent invoke ack must return through the SDK invoke handler, not `ctx.sendActivity`. - stream-failure-fallback (codex P2, galin F5) — `streamFailed` latch so partial streams fall back to block delivery on non-cancel errors. - serviceurl-routing (Brad #4, codex P2) — proposed rebuttal pending empirical confirmation that `smba.trafficmanager.net/teams` routes to non-default-region conversations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(msteams): lock SDK auth contract — aud + v1/v2 issuer coverage Adds extensions/msteams/src/auth-coverage.test.ts driving ServiceTokenValidator and createEntraTokenValidator directly with jose-minted RS256 tokens against an in-memory JWKS (via JwksClient.prototype patch). Locks in the three contract cases @BradGroux flagged on openclaw#76262: aud=<bot app id> accepted, aud=api.botframework.com rejected even when appid/azp match, and v1/v2 issuers accepted for allowed tenant (disallowed tenant rejected). Drops a stale ambient module declaration in src/types/microsoft-teams-sdk.d.ts that was shadowing the SDK's real jwt-validator types with a long-renamed createServiceTokenValidator surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(msteams): route file-consent invokes through typed app.on, drop broken invokeResponse send Brad #2 / codex #4 on PR openclaw#76262 — `ctx.sendActivity({ type: "invokeResponse", ... })` no longer reaches Teams as an HTTP InvokeResponse on the new SDK; it becomes an outbound Bot Framework activity instead. Move file-consent accept/decline to typed `app.on("file.consent.accept|decline", ...)` handlers. The SDK's typed-route layer wraps a void return into `{ status: 200 }` (`app.process.js:130`), so the manual ack disappears. While in here, type `MSTeamsApp.on` properly. Borrowing the SDK's `App.on` directly fails because that function carries a `this: App<TPlugin>` constraint our structural alias can't satisfy, so we model an equivalent generic over `IRoutes` with route-specific overloads (`card.action`, `file.consent.*`, `activity`). The overloads work around a tsgo bug — the `@microsoft/teams.api` `Activity` discriminated union collapses to `any`, turning `ActivityRoutes` into a `[string]: RouteHandler<X, void>` index signature that swallows every typed `Out` not already void-compatible (card.action returns `AdaptiveCardActionResponse`; the others happen to include `void`). Real tsc resolves cleanly. Linked upstream: microsoft/typescript-go#1057. Other cleanups: - Cast-free call sites for `adaptSdkContext` (now returns `MSTeamsTurnContext` instead of `unknown`). - card.action error responses include `innerHttpError` per the SDK's `HttpError` shape requirement. - Activity catch-all also skips `fileConsent/invoke` now that it's typed-routed (parallel to the existing `adaptiveCard/action` skip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(msteams): route SSO sign-in invokes through typed app.on, drop broken invokeResponse send Brad #2 / codex #4 on PR openclaw#76262, SSO half. Continue the typed-route migration: `signin/tokenExchange` and `signin/verifyState` now register via `app.on("signin.token-exchange" | "signin.verify-state", ...)`. Per the SDK's router, registering a user route with the same name as a system route removes the system default — so the SDK's built-in handlers (which would call `api.users.token.exchange` themselves and emit a `signin` event nobody currently subscribes to) are silenced, and only ours runs. The SDK wraps a void return into the HTTP 200 InvokeResponse, so the legacy `ctx.sendActivity({ type: "invokeResponse", ... })` ack — broken on the new SDK because it becomes an outbound BF activity instead of the HTTP response — is gone. The handler body is extracted from the activity-catch-all dispatch in `monitor-handler.ts` to a new `signin-invoke.ts`, parallel to `file-consent-invoke.ts`. `isSigninInvokeAuthorized` is now exported from `monitor-handler.ts` so the new handler can reuse it. The activity catch-all skips the SSO invoke names alongside the existing skips for `adaptiveCard/action` and `fileConsent/invoke`. `MSTeamsAppOn` overloads now cover the two SSO routes with their typed ctx (`ISignInTokenExchangeInvokeActivity` / `ISignInVerifyStateInvokeActivity`). Tests in `monitor-handler.sso.test.ts` were rewritten to call the extracted handler directly — the `registered.run(ctx)` shape no longer covers SSO, and the `expect(ctx.sendActivity).toHaveBeenCalledWith({ type: "invokeResponse" })` assertions were dropped to match the new contract (the SDK ack happens via the typed-route return value). Note on overlap with openclaw#77784 (Stefan Stüben, Microsoft): that PR is doing a much bigger SSO rework (sign-in card / sign-in-link / six-digit-code fallbacks plus a `ctx.auth` plumbed to plugin tools). This change is the small migration-correctness fix and is structured so openclaw#77784's SSO body changes drop into the typed-route registrations cleanly on rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(msteams): route message-submit (feedback) invokes through typed app.on Last invoke off the activity catch-all dispatch. `message/submitAction` (thumbs up/down on AI-generated messages) now registers via `app.on("message.submit", ...)`. Same shape as file-consent and SSO: handler body extracted to a new `feedback-invoke.ts`, the SDK wraps a void return into the HTTP 200 InvokeResponse, the broken `ctx.sendActivity({ type: "invokeResponse", ... })` line is gone, and the activity catch-all skips this invoke name alongside the others. `isFeedbackInvokeAuthorized` is exported from `monitor-handler.ts` so `feedback-invoke.ts` can reuse it. Tests in `monitor-handler.feedback-authz.test.ts` were rewritten to call the extracted handler directly — the old `handler.run(ctx)` shape no longer intercepts feedback, and `originalRun` was removed because the typed route is the dispatch point now. `MSTeamsAppOn` overload added with the typed `IMessageSubmitActionInvokeActivity` ctx, slotted between the SSO overloads and the `activity` catch-all so `activity` stays last. This leaves only `message`, `conversationUpdate`, and `messageReaction` flowing through `app.on("activity", ...)` → `handler.run`. Promoting those is the path to deleting the catch-all entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(msteams): fall back to block delivery when partial-mode stream fails mid-flight codex #5 / Galin F5 on PR openclaw#76262. `reply-stream-controller.ts` previously re-threw any non-cancel error from `stream.emit` during partial streaming and from `stream.emit`/`stream.close` during finalize. Combined with `preparePayload` suppressing block delivery once `tokensEmitted` was true, that meant a network blip or API error mid-stream produced a truncated reply with no recovery — the user saw the prefix that made it through and nothing else. Add a `streamFailed` latch parallel to `canceledLocally` / `tokensEmitted`: - `onPartialReply`: catch non-cancel errors, set `streamFailed = true`, log a warn, don't propagate (the pipeline must keep running so `preparePayload` can decide). - `preparePayload`: when `tokensEmitted && streamFailed`, fall through to block delivery instead of suppressing. The user may see a duplicate (streamed prefix + full block reply); intentional — matches the pre-migration `TeamsHttpStream.hasContent` recovery and is better than truncated-only. - `finalize`: same latch + warn on non-cancel close failure, swallow rather than throw. The streamed content already reached the user; the closing activity (AI-Generated marker, feedback channelData) is the only loss, not worth blowing up the dispatcher. - `isStreamActive` returns false once the stream has failed. New tests cover crash-mid-stream after tokens were emitted (assert block delivery payload is returned), happy-path no-duplicate behavior (assert `preparePayload` still suppresses when nothing failed), and finalize close-failure (assert no throw). The pre-existing "re-throws non-cancel" test was inverted to assert non-throwing latch behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(msteams): declare @microsoft/teams.api as a runtime dependency Type-only `import("@microsoft/teams.api/dist/...").TypeName` references in `sdk.ts` (added when typed `MSTeamsApp.on` overloads were introduced) are picked up by the `extension-runtime-dependencies` contract test as genuine runtime imports. Declaring `@microsoft/teams.api` as a direct dep makes the contract pass; the package was already coming in transitively via `@microsoft/teams.apps`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(msteams): keep SSO on SDK signin routes * test(msteams): avoid redundant signin handler assertion * docs(msteams): clarify Teams cloud support * fix(msteams): use current SDK string helper * fix(msteams): gate SDK invoke side effects * test(msteams): avoid implicit any in lifecycle tests * fix(msteams): preserve SDK user agent and matrix check * fix(msteams): expose SDK common dependency * fix(msteams): use SDK user agent merge * fix(msteams): fall back when stream close no-ops * chore(msteams): drop unrelated merge artifacts * chore(msteams): restore unrelated main files * chore(msteams): restore unrelated main files * chore(msteams): restore unrelated main files * test(msteams): type stream close mock result * fix(msteams): configure Teams cloud service URL * chore(msteams): refresh shrinkwrap * chore(deps): refresh shrinkwrap locks * chore(ci): rerun guards after main sync * chore(deps): refresh shrinkwrap for node 24 * chore(config): refresh docs baseline * fix(msteams): preserve Teams SDK proactive references * fix(msteams): harden SDK proactive sends * fix(msteams): align service url contract * test: fix bonjour beacon type narrowing * fix(msteams): ignore ambient service url * fix(msteams): fall through submit invokes * test: align shrinkwrap override policy with Teams SDK deps * fix(msteams): ack invoke routes promptly * fix(msteams): support china cloud boundaries * test: sync PR with current CI gates * test: isolate channel setup registry metadata --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
Problem
On a WhatsApp deployment, video generation rendered successfully but failed to deliver —
Video generation failed: Video generation completion delivery failed after successful generation— and the requester's session was wedged for ~17 minutes per attempt.Root cause: a turn interrupted via
sessions_yieldcan leave the agent's_agentEventQueuepromise pending indefinitely.waitForSessionEventQueue(attempt.session-lock.ts) sits on every session-write-lock release path — the in-loop waits in the embedded run and the cleanup-timeacquireForCleanup— and awaited that promise unbounded. So the interrupted turn's session write lock was never released until themaxHoldMswatchdog (~17 min). That orphaned lock dead-locked the background media completion-wake, which acquires the same session lock to inject the result and times out at 60s → delivery fails, session wedged.Confirmed in production:
sessions_yield abort settle timed out→SessionWriteLockTimeoutError (timeout 60000ms)→Media generation completion wake failed; requester session was not woken→[session-write-lock] releasing lock held for 1071173ms (max=1020000ms)(watchdog, ~17.85 min).acquireSessionWriteLock'srespectMaxHold: !heldByThisProcessmeans an in-process contender can never reclaim a self-held lock, so only the watchdog frees it.Fix
Bound the wait in
waitForSessionEventQueue(OPENCLAW_SESSION_EVENT_QUEUE_WAIT_TIMEOUT_MS, default 5s; 250ms underOPENCLAW_TEST_FAST). Because every release path funnels through this helper, bounding it guarantees the lock releases within a bounded window on a wedged turn instead of waiting on the watchdog. Normal turns are unaffected — the event queue drains in milliseconds, so the bound never triggers.Test
attempt.session-lock.test.ts: a never-draining_agentEventQueuenow letsacquireForCleanupcomplete and release the lock (asserts < 10s, vs the 300s maxHold the watchdog would otherwise wait). Fullattempt.session-locksuite passes (35/35);tsgo:core+tsgo:core:testclean.Validation
Deployed to the live deployment; a real WhatsApp video request rendered and delivered end-to-end. (The fix only engages on the wedge path, covered by the unit test; the live request exercised normal delivery.)
Residual risk
Cleanup also calls
flushPendingToolResultsAfterIdle, which has its own idle wait. If that can independently hang on a wedged turn, a companion bound may be needed — not observed in validation.