fix(matrix): contain sync outage failures#62779
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69a8b95c3f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR hardens the Matrix integration against homeserver sync outages by making Matrix startup wait for actual sync readiness and by ensuring monitor/background work is owned and contained within the Matrix channel lifecycle (preventing process-wide unhandled rejections).
Changes:
- Add Matrix sync-state typing/helpers and wire Matrix sync lifecycle events (
sync.state,sync.unexpected_error) through the MatrixClient bridge. - Make
MatrixClient.start()wait for initial ready sync states (with timeout / fatal handling) before reporting startup success. - Introduce a centralized monitor task runner + sync lifecycle/status controllers so detached monitor work is tracked, drained on shutdown, and fatal sync errors fail the channel task (not the whole process).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| extensions/matrix/src/matrix/sync-state.ts | Defines sync state type + readiness/disconnected/terminal helpers. |
| extensions/matrix/src/matrix/sdk/types.ts | Extends Matrix client event map with sync lifecycle events. |
| extensions/matrix/src/matrix/sdk.ts | Waits for initial sync readiness; bridges SDK sync events; tracks current sync state. |
| extensions/matrix/src/matrix/sdk.test.ts | Adds tests for startup readiness gating, unexpected sync errors, and startup timeout. |
| extensions/matrix/src/matrix/monitor/task-runner.ts | Adds tracked detached-task runner + idle draining for shutdown containment. |
| extensions/matrix/src/matrix/monitor/sync-lifecycle.ts | Adds sync fatality ownership to fail the channel task on unexpected STOPPED / sync fatal events. |
| extensions/matrix/src/matrix/monitor/sync-lifecycle.test.ts | Tests fatal sync handling and intentional shutdown STOPPED handling. |
| extensions/matrix/src/matrix/monitor/status.ts | Adds Matrix monitor status controller to publish starting/healthy/error/stopped snapshots. |
| extensions/matrix/src/matrix/monitor/index.ts | Wires status + lifecycle + task runner into the Matrix monitor and abort/fatal handling. |
| extensions/matrix/src/matrix/monitor/index.test.ts | Adds tests for status publishing, detached-task rejection containment, and fatal sync propagation. |
| extensions/matrix/src/matrix/monitor/events.ts | Wraps key event handlers in contained tasks to prevent unhandled rejections. |
| extensions/matrix/src/channel.ts | Plumbs channel lifecycle status sink (ctx.setStatus) into Matrix monitor. |
| CHANGELOG.md | Documents the Matrix/gateway outage containment fix. |
Comments suppressed due to low confidence (1)
extensions/matrix/src/matrix/sdk.ts:468
startSyncSession()callsthis.client.startClient()and then can reject fromwaitForInitialSyncReady()(timeout / unexpected error / terminal state). On those failure paths, the underlying matrix-js-sdk client remains started and will keep its sync loop running, whileMatrixClient.startedstaysfalse, so subsequent retries may attempt a secondstartClient()on an already-running client.
Consider wrapping the post-startClient() startup phase in a try/catch (or try/finally) that stops the SDK client (e.g., via stopSyncWithoutPersist() / client.stopClient()) and resets any related state before rethrowing, so a failed startup does not leak background work or leave the instance in a half-started state.
private async startSyncSession(opts: { bootstrapCrypto: boolean }): Promise<void> {
if (this.started) {
return;
}
await this.ensureCryptoSupportInitialized();
this.registerBridge();
await this.initializeCryptoIfNeeded();
await this.client.startClient({
initialSyncLimit: this.initialSyncLimit,
});
await this.waitForInitialSyncReady();
if (opts.bootstrapCrypto && this.autoBootstrapCrypto) {
await this.bootstrapCryptoIfNeeded();
}
this.started = true;
this.emitOutstandingInviteEvents();
await this.refreshDmCache().catch(noop);
Greptile SummaryThis PR contains the Matrix channel startup/reliability fix: The three new modules — Confidence Score: 5/5Safe to merge. The fix is logically correct, well-tested, and scoped entirely to the Matrix extension. All remaining findings are P2 (documentation/style). The core correctness of the bug fix is sound: startup readiness gating, fatal-error routing through channel lifecycle, and background-task containment all behave correctly. The test coverage directly locks in the key invariants (startup timeout, fatal rejection, detached-task containment, intentional-shutdown STOPPED classification). No files require special attention.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd310b208d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
dd310b2 to
657e1ce
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 657e1ce06a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
657e1ce to
9cd2677
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9cd2677d7a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
9cd2677 to
d0bce95
Compare
d3ed4da to
843a52a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 843a52a7c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
97e4a7d to
6b4662e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b4662e7f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
274ea16 to
382e4e6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 382e4e667a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
🔒 Aisle Security AnalysisWe found 3 potential security issue(s) in this PR:
1. 🟡 Unbounded background task tracking enables memory/CPU exhaustion and can block shutdown
Description
Vulnerable code: const inFlight = new Set<Promise<void>>();
...
inFlight.add(trackedTask);
...
while (inFlight.size > 0) {
await Promise.allSettled(Array.from(inFlight));
}RecommendationIntroduce backpressure and bounded concurrency, and ensure tasks cannot hang forever. Options (can be combined):
import pLimit from "p-limit";
const limit = pLimit(25); // tune
const runDetachedTask = (label: string, task: () => Promise<void>) => {
const trackedTask = limit(async () => {
await task();
})
.catch((err) => { /* log */ })
.finally(() => inFlight.delete(trackedTask));
inFlight.add(trackedTask);
return trackedTask;
};
2. 🟡 Matrix startup abort misclassification via generic `AbortError` name check
Description
This is an error-class confusion issue that can lead to unintended early termination (availability impact) and altered cleanup semantics based on an error name that is not unique to the intended abort condition. Vulnerable code: export function isMatrixStartupAbortError(error: unknown): boolean {
return error instanceof Error && error.name === "AbortError";
}RecommendationUse a distinct error type or marker that cannot be confused with generic abort errors from other libraries. Option A (preferred): custom class + export class MatrixStartupAbortError extends Error {
constructor() {
super("Matrix startup aborted");
this.name = "MatrixStartupAbortError";
}
}
export function createMatrixStartupAbortError(): Error {
return new MatrixStartupAbortError();
}
export function isMatrixStartupAbortError(err: unknown): err is MatrixStartupAbortError {
return err instanceof MatrixStartupAbortError;
}Option B: add a non-enumerable symbol marker const kMatrixStartupAbort = Symbol.for("openclaw.matrix.startup_abort");
export function createMatrixStartupAbortError(): Error {
const e = new Error("Matrix startup aborted");
(e as any)[kMatrixStartupAbort] = true;
return e;
}
export function isMatrixStartupAbortError(err: unknown): boolean {
return err instanceof Error && (err as any)[kMatrixStartupAbort] === true;
}Then keep handling generic 3. 🟡 MatrixClient.start() abort/timeout can leave Matrix sync running in background
Description
This can cause:
Vulnerable flow:
Vulnerable code: await this.client.startClient({ initialSyncLimit: this.initialSyncLimit });
await this.waitForInitialSyncReady({ abortSignal: opts.abortSignal, timeoutMs: opts.readyTimeoutMs });If RecommendationEnsure the underlying matrix-js-sdk client is stopped if startup fails after For example, wrap the post- await this.client.startClient({ initialSyncLimit: this.initialSyncLimit });
try {
await this.waitForInitialSyncReady({ abortSignal: opts.abortSignal, timeoutMs: opts.readyTimeoutMs });
throwIfMatrixStartupAborted(opts.abortSignal);
if (opts.bootstrapCrypto && this.autoBootstrapCrypto) {
await this.bootstrapCryptoIfNeeded(opts.abortSignal);
}
throwIfMatrixStartupAborted(opts.abortSignal);
this.started = true;
this.emitOutstandingInviteEvents();
await this.refreshDmCache().catch(noop);
} catch (e) {
// stop any background sync started by startClient()
this.stopSyncWithoutPersist();
throw e;
}This guarantees abort/timeout/unexpected errors during startup do not leave a running sync session behind. Analyzed PR: #62779 at commit Last updated on: 2026-04-08T18:43:14Z |
21250c5 to
27f9d85
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27f9d850c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 795ef740b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
795ef74 to
901bb76
Compare
|
Merged via squash.
Thanks @gumadeiras! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 901bb767b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Merged via squash. Prepared head SHA: 901bb76 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
* release: mirror bundled channel deps at root (openclaw#63065) Merged via squash. Prepared head SHA: ac26799 Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com> Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com> Reviewed-by: @scoootscooob * fix(test): keep warn log capture under openclaw temp dir * revert: undo background alive review findings fix * feat: add qa character vibes eval * test: stabilize plugin boundary invariants * test: isolate agent gateway cli command mocks * test: skip duplicate package boundary wrapper in ci * test: fix postpublish verifier sidecar handling * test: keep status tests off live usage probes * auto-reply: type status auth overrides * plugins: read contract inventory from manifests * test: inline cli metadata channel fixture * ci: skip duplicate full extension shard * test: isolate discord directory live token env * test: keep followup runner memory mock complete * ci: split parallel full suite into leaf shards * test: guard loader fixtures against broad sdk imports * test: keep bundled channel entry smokes descriptor-only * ci: reduce full suite test parallelism * test: avoid bundled test api smokes in matrix and telegram * test: keep discord and irc entry smokes descriptor-only * test: keep web provider artifact coverage manifest-only * test: keep provider policy artifact coverage narrow * test: keep web provider artifact test in boundary * test: keep status message tests off auth auto-detection * status: avoid plugin lookup for direct channel model overrides * channels: fast-path direct model override matches * test: restore manifest-only web provider coverage * fix: allow blank TLS manual port default (openclaw#63134) (thanks @Tyler-RNG) * make port optional for TLS manual connections * fix: restrict manual blank-port fallback to tls * fix: allow blank TLS manual port default (openclaw#63134) (thanks @Tyler-RNG) --------- Co-authored-by: Ayaan Zaidi <hi@obviy.us> * test: fix full suite CI test isolation * fix: align LLM idle timeout policy * test: exercise models json file mode without provider discovery * test: keep shared dm policy contract off channel facades * test: keep web provider artifact test in boundary * test: keep kilocode provider tests on plugin-owned helpers * ci: restore sequential full suite tests * test: keep public artifact coverage on cheap boundaries * test: keep openclaw tools registration tests on a fast shell * test: keep bundled metadata sidecar scan inventory-only * docs(inferrs): fix Gemma model id from gg-hf-gg to google (openclaw#62586) * fix: harden bundled plugin dependency release checks * ci: isolate full suite leaf shards * test: keep openclaw tools registration policy pure * fix: support Codex CLI QA auth * feat: add QA character eval reports * docs: document QA character eval workflow * refactor: dedupe media generation tool helpers * refactor: dedupe internal helper glue * refactor: dedupe shared helper branches * refactor: dedupe browser navigation guard tests * refactor: dedupe config and subagent tests * refactor: dedupe test helpers and script warning filter * refactor: dedupe plugin test harnesses * refactor: dedupe media runtime test mocks * refactor: dedupe plugin metadata test helpers * refactor: dedupe firecrawl and directive helpers * refactor: dedupe exec defaults tests * refactor: dedupe approval runtime tests * refactor: dedupe matrix exec approval tests * refactor: dedupe telegram exec approval tests * refactor: dedupe doctor codex oauth tests * refactor: dedupe agent command test fixtures * refactor: dedupe embedding provider test fixtures * refactor: share html entity tool call decoding * fix: keep minimax provider mocks package-local * test: keep pdf and update-plan registration tests pure * test: keep model reasoning override coverage on merge helpers * fix: default OpenAI reasoning effort to high * test: keep kimi implicit provider tests on provider catalog * fix(build): prune stale bundled plugin node_modules * fix(build): address bundled plugin prune review * fix(build): honor postinstall disable flag * test: keep chutes implicit provider tests on provider catalog * fix(plugin-sdk): export channel plugin base * docs: reorder changelog entries * test: keep bundled web-search owner checks on public artifacts * fix(build): keep tsdown prune best-effort * test: trust gateway exec fixture node path * fix: keep runtime task test harness behind task seams * test: explain gateway exec fixture trust * Reply: surface OAuth reauth failures (openclaw#63217) Merged via squash. Prepared head SHA: 68b7ffd Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky * test: make character eval scenario natural * feat: add character eval model options * test: keep pi fs workspace tests on fs tool factories * test: keep media runtime tests on same-directory provider mocks * fix(android): auto-resume pairing approval * fix(android): prefer bootstrap auth on qr pairing * fix(android): reset auth on new setup codes * fix(android): tighten pairing retry behavior * fix(android): prefer stored device auth after pairing * fix: restore android qr pairing flow (openclaw#63199) * fix(auto-reply): strip leading NO_REPLY tokens to prevent silent-reply leak (openclaw#63068) * fix(auto-reply): strip leading NO_REPLY tokens to prevent silent-reply leak * fix(auto-reply): preserve substantive NO_REPLY leading text * fix(agents): preserve ACP silent-prefix cumulative deltas * fix(auto-reply): harden silent-token streaming paths * fix(auto-reply): normalize glued silent tokens consistently --------- Co-authored-by: termtek <termtek@ubuntu.tail2b72cd.ts.net> * fix(gateway): clear auto-fallback model override on session reset (openclaw#63155) * fix(gateway): clear auto-fallback model override on session reset When `persistFallbackCandidateSelection()` writes a fallback provider override with `authProfileOverrideSource: "auto"`, the override was incorrectly preserved across `/reset` and `/new` commands. This caused sessions to keep using the fallback provider even after the user changed the agent config primary provider, because the session store override takes precedence over the config default. Now the override fields (`providerOverride`, `modelOverride`, `authProfileOverride`, `authProfileOverrideSource`, `authProfileOverrideCompactionCount`) are only carried forward when `authProfileOverrideSource === "user"` (i.e. explicit `/model` command). System-driven overrides are dropped on reset so the session picks up the current config default. Introduced in cb0a752 ("fix: preserve reset session behavior config") * fix(gateway): preserve explicit reset model selection * fix(gateway): track reset model override source * fix(gateway): preserve legacy reset model overrides * docs(changelog): add session reset merge note --------- Co-authored-by: termtek <termtek@ubuntu.tail2b72cd.ts.net> * test: stabilize ci test isolation * test: isolate volcengine byteplus auth resolver imports * fix: patch hono security advisories * fix: pass system prompt to codex cli * fix(plugins): prevent untrusted workspace plugins from hijacking bundled provider auth choices [AI] (openclaw#62368) * fix: address issue * fix: address review feedback * docs(changelog): add onboarding auth-choice guard entry * fix: address PR review feedback * fix: address PR review feedback * fix: address PR review feedback * fix: address PR review feedback * fix: address PR review feedback * fix: address PR review feedback * fix: address PR review feedback * fix: address PR review feedback --------- Co-authored-by: Devin Robison <drobison@nvidia.com> * test: isolate provider runtime test mocks * feat(plugins): support provider auth aliases * feat(memory): add grounded REM backfill lane (openclaw#63273) Merged via squash. Prepared head SHA: 4450f25 Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky * feat(memory): harden grounded REM extraction (openclaw#63297) Merged via squash. Prepared head SHA: e188b7e Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky * feat(ui): add dreaming diary controls and navigation (openclaw#63298) Merged via squash. Prepared head SHA: 0a2ae66 Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky * chore(ui): refresh zh-TW control ui locale * chore(ui): refresh zh-CN control ui locale * chore(ui): refresh pt-BR control ui locale * chore(ui): refresh de control ui locale * chore(ui): refresh es control ui locale * chore(ui): refresh ko control ui locale * chore(ui): refresh ja-JP control ui locale * chore(ui): refresh fr control ui locale * docs(matrix): tighten setup and config guidance * chore(ui): refresh tr control ui locale * chore(ui): refresh uk control ui locale * chore(ui): refresh pl control ui locale * chore(ui): refresh id control ui locale * test: stabilize full-suite execution * fix(matrix): contain sync outage failures (openclaw#62779) Merged via squash. Prepared head SHA: 901bb76 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras * Align remote node exec event system messages with untrusted handling (openclaw#62659) * fix(nodes): downgrade remote exec system events * docs(changelog): add remote node exec event entry --------- Co-authored-by: Devin Robison <drobison@nvidia.com> * test: reuse image generate tool imports * test: reuse followup runner imports * docs(config): tighten wording in reference * test: harden provider mock isolation * fix(memory): accept embedded dreaming heartbeat tokens * test: harden Parallels macOS smoke fallback * build: narrow plugin SDK declaration build * fix(dotenv): block workspace runtime env vars (openclaw#62660) * fix(dotenv): block workspace runtime env vars Co-authored-by: zsx <git@zsxsoft.com> * docs(changelog): add workspace dotenv runtime-control entry * fix(dotenv): block workspace gateway port override --------- Co-authored-by: zsx <git@zsxsoft.com> Co-authored-by: Devin Robison <drobison@nvidia.com> * build: stage nostr runtime dependencies * fix: load QA live provider overrides * feat: parallelize character eval runs * auth: avoid external cli sync on profile upsert * test(doctor): mock memory-core runtime seam * auth: persist explicit profile upserts directly * Matrix: report startup failures as errors * fix(browser): harden browser control override loading (openclaw#62663) * fix(browser): harden browser control overrides * fix(lint): prepare boundary artifacts for extension oxlint * docs(changelog): add browser override hardening entry * fix(lint): avoid duplicate boundary prep --------- Co-authored-by: Devin Robison <drobison@nvidia.com> Co-authored-by: Devin Robison <drobison00@users.noreply.github.com> * test: reuse exec directive reply imports * test: reuse verbose directive reply imports * fix(browser): re-check interaction-driven navigations (openclaw#63226) * fix(browser): guard interaction-driven navigations * fix(browser): avoid rechecking unchanged interaction urls * fix(browser): guard delayed interaction navigations * fix(browser): guard interaction-driven navigations for full action duration * fix(browser): avoid waiting on interaction grace timer * fix(browser): ignore same-document hash-only URL changes in navigation guard * fix(browser): dedupe interaction nav guards * fix(browser): guard same-URL reloads in interaction navigation listeners * docs(changelog): add interaction navigation guard entry * fix(browser): drop duplicate ssrfPolicy props * fix(browser): tighten interaction navigation guards --------- Co-authored-by: Devin Robison <drobison@nvidia.com> * test: move directive state coverage to pure tests * fix: enable thinking support for the ollama api (openclaw#62712) Merged via squash. Prepared head SHA: c0b9950 Co-authored-by: hoyyeva <63033505+hoyyeva@users.noreply.github.com> Co-authored-by: BruceMacD <5853428+BruceMacD@users.noreply.github.com> Reviewed-by: @BruceMacD * Slack: treat ACP block text as visible output (openclaw#62858) Merged via squash. Prepared head SHA: 14f202e Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras * fix: fail fast on qa live auth errors * fix: fail fast across qa scenario wait paths * test: cover qa scenario wait failure replies * fix: sanitize qa missing-key replies * test: cover sanitized qa missing-key replies * fix: align qa wait cursor semantics * test: cover mixed-traffic qa wait cursors * fix: classify curated qa missing-key replies * test: cover curated qa missing-key reply classification * fix: harden qa missing-key provider messages * test: cover unsafe qa missing-key providers * docs(changelog): add qa auth fail-fast entry (openclaw#63333) (thanks @shakkernerd) * fix(matrix/doctor): migrate legacy channels.matrix.dm.policy 'trusted' (fixes openclaw#62931) (openclaw#62942) Merged via squash. Prepared head SHA: d9f553b Co-authored-by: lukeboyett <46942646+lukeboyett@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras * Memory/dreaming: feed grounded backfill into short-term promotion (openclaw#63370) Merged via squash. Prepared head SHA: 5dfe246 Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky * docs: update unreleased changelog * fix(gateway): classify dream diary actions * fix(memory): align dreaming status payloads * Memory/dreaming: harden grounded backfill follow-ups * test: reuse inline directive reply imports * Docs/memory: explain grounded backfill flows * fix(deps): patch basic-ftp advisory * test: move inline directive collisions to pure tests * Slack: dedupe partial streaming replies (openclaw#62859) Merged via squash. Prepared head SHA: cbecb50 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras * test: replace exec directive e2e with pure coverage * fix(plugins): keep test helpers out of contract barrels (openclaw#63311) Merged via squash. Prepared head SHA: 769e90c Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com> Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com> Reviewed-by: @altaywtf * test: move cron heartbeat delivery coverage below full turns * fix: inter-session messages must not overwrite established external lastRoute (openclaw#58013) Merged via squash. Prepared head SHA: 820ea20 Co-authored-by: duqaXxX <12242811+duqaXxX@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman * fix(gateway): suppress announce/reply skip chat leakage (openclaw#51739) Merged via squash. Prepared head SHA: 2f53f3b Co-authored-by: Pinghuachiu <9033138+Pinghuachiu@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman * Slack: key turn-local dedupe by dispatch kind Scope Slack turn-local delivery dedupe by reply dispatch kind so identical tool and final payloads on the same thread do not collapse into one send. Expose the existing dispatcher kind on the public reply-runtime seam and cover the Slack tracker and preview-fallback paths with regression tests. * Dreaming: surface grounded scene lane (openclaw#63395) Merged via squash. Prepared head SHA: 0c7f586 Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky * test: avoid runtime auth overlays in failure-state coverage * fix(ci): align ollama thinking expectations * chore(ui): refresh zh-CN control ui locale * chore(ui): refresh pt-BR control ui locale * chore(ui): refresh zh-TW control ui locale * chore(ui): refresh de control ui locale * test(docker): reduce e2e log noise * chore(ui): refresh es control ui locale * chore(ui): refresh fr control ui locale * chore(ui): refresh ja-JP control ui locale * chore(ui): refresh ko control ui locale * chore(ui): refresh uk control ui locale * chore(ui): refresh id control ui locale * chore(ui): refresh pl control ui locale * chore(ui): refresh tr control ui locale * fix: restore main ci * fix(ci): drop silent history before truncation * docs: reorder unreleased changelog * test(docker): quiet success-path e2e logs * style: sort session import * build: mirror bundled plugin runtime deps * plugins: load lightweight provider discovery entries * ci: narrow Windows node test lane * fix: filter provider auth aliases by plugin trust * fix: surface delayed browser navigation blocks * style: format memory and gateway touchups * Delete docs/plans directory Unused artifact * test: avoid remote ollama timeout in api-key preservation coverage * test: keep auth-choice default-model coverage on lightweight provider * test: keep undefined-token auth-choice coverage generic * fix: stabilize character eval and Qwen model routing * test: keep agent command tests off external auth overlays * fix openrouter model picker refs (openclaw#63416) * fix openrouter model picker refs Signed-off-by: sallyom <somalley@redhat.com> * test(ui): cover openrouter slash-id /model resolution --------- Signed-off-by: sallyom <somalley@redhat.com> Co-authored-by: Vignesh Natarajan <vignesh.natarajan92@gmail.com> * ci: stabilize macOS and transcript policy tests * test: keep cli-provider agent command tests off external auth overlays * chore(lint): clear extension lint regressions and add openclaw#63416 changelog * test: update modelstudio catalog contract sentinel * test: update character eval public panel * fix: repair Windows dev-channel updater * test: move copilot models-json injection coverage to plan tests * plugin-sdk: split command status surface * plugin-sdk: keep command status compatibility path light * plugin-sdk: drop investigative weixin repro harness * tests: document config mock choice for eager warmup * fix: update command-status SDK baseline (openclaw#63174) (thanks @hxy91819) * test: cap broad live model sweeps * fix: drop raw gateway chat control replies * test: make shared-token reload deterministic * test: isolate agentic suite smoke tests * test: replace models-config matrix with narrow coverage * test: isolate onboard skills status mock * plugins: add lightweight anthropic vertex discovery * test: isolate model auth module state * test: isolate subagent registry resume imports * plugins: keep google provider policy lightweight * test: keep ollama unreachable discovery on localhost * test: mock auth profile external overlay in oauth tests * auth: avoid plugin setup scans during common auth resolution * fix(logging): break console/logger type cycle * fix(config): stop owner-display barrel cycles * fix(commands): split auth choice apply types * fix(infra): extract exec approvals allowlist types * fix(commands): split doctor prompt option types * chore: prepare 2026.4.9-beta.1 release * chore: refresh config schema version for 2026.4.9-beta.1 * chore: refresh plugin SDK API baseline * test: run local full suite project shards in parallel * wizard: add explicit skip option to plugin setup (openclaw#63436) * Wizard: allow skipping plugin setup * Agents: reset nodes tool test modules * tests: reset discord native-command seams in model picker (openclaw#63267) * ci: tolerate noisy npm pack json output * test: isolate slack thread-ts recovery * fix(msteams): isolate channel thread sessions by replyToId (openclaw#58615) (openclaw#62713) * fix(msteams): isolate thread sessions by replyToId (openclaw#58615) * fix(msteams): align thread ID extraction + fix test types * fix(msteams): route thread replies to correct thread via replyToId (openclaw#58030) (openclaw#62715) * fix(msteams): pin reply target at inbound time to prevent DM/channel leak (openclaw#54520) (openclaw#62716) * test: keep local full suite serial by default * chore: prepare 2026.4.9 stable release * Agents: guard legacy pi transport override * Agents: restore upstream pi runner sources --------- Signed-off-by: sallyom <somalley@redhat.com> Co-authored-by: scoootscooob <zhentongfan@gmail.com> Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Nimrod Gutman <nimrod.gutman@gmail.com> Co-authored-by: Tyler Warburton <Ethan.gold-Steinberg@protonmail.com> Co-authored-by: Ayaan Zaidi <hi@obviy.us> Co-authored-by: Eric Curtin <eric.curtin@docker.com> Co-authored-by: Mariano <mbelinky@gmail.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: Frank Yang <frank.ekn@gmail.com> Co-authored-by: termtek <termtek@ubuntu.tail2b72cd.ts.net> Co-authored-by: Pavan Kumar Gondhi <pgondhi@nvidia.com> Co-authored-by: Devin Robison <drobison@nvidia.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: Agustin Rivera <31522568+eleqtrizit@users.noreply.github.com> Co-authored-by: zsx <git@zsxsoft.com> Co-authored-by: Devin Robison <drobison00@users.noreply.github.com> Co-authored-by: Eva H <63033505+hoyyeva@users.noreply.github.com> Co-authored-by: BruceMacD <5853428+BruceMacD@users.noreply.github.com> Co-authored-by: Shakker <shakkerdroid@gmail.com> Co-authored-by: lukeboyett <46942646+lukeboyett@users.noreply.github.com> Co-authored-by: Altay <altay@uinaf.dev> Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com> Co-authored-by: Accunza <12242811+duqaXxX@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Co-authored-by: Pinghuachiu <9033138+Pinghuachiu@users.noreply.github.com> Co-authored-by: Radek Sienkiewicz <mail@velvetshark.com> Co-authored-by: Sally O'Malley <somalley@redhat.com> Co-authored-by: Vignesh Natarajan <vignesh.natarajan92@gmail.com> Co-authored-by: Mason Huang <masonxhuang@tencent.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: pashpashpash <nik@vault77.ai> Co-authored-by: sudie-codes <suvenkat95@gmail.com>
Merged via squash. Prepared head SHA: 901bb76 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
Merged via squash. Prepared head SHA: 901bb76 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
Merged via squash. Prepared head SHA: 901bb76 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Reviewed-by: @gumadeiras
Summary
gateway.channelMaxRestartsPerHour.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
matrix-js-sdkstartup as ready too early, then left background room-message/verification tasks detached from channel lifecycle ownership.matrix-js-sdkemits long-lived sync state transitions afterstartClient()returns, and replayed/in-flight events during outage windows made orphaned async failures much easier to hit.Regression Test Plan (if applicable)
extensions/matrix/src/matrix/sdk.test.ts,extensions/matrix/src/matrix/monitor/index.test.ts,extensions/matrix/src/matrix/monitor/sync-lifecycle.test.tsUser-visible / Behavior Changes
starting,healthy,error, andstoppedtransitions more accurately.Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation:Repro + Verification
Environment
Steps
Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
pnpm build.pnpm checkremains blocked by unrelated preexistingtsgofailures inextensions/msteams/src/attachments.graph.test.ts,src/agents/subagent-registry.test.ts, andsrc/infra/host-env-security.test.ts.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.sync.state = ERRORremains SDK-owned reconnect behavior and is not automatically escalated into channel restart.sync.unexpected_error, unexpected STOPPED, startup readiness failure) to avoid fighting the SDK on transient reconnects.