Skip to content

fix(matrix): contain sync outage failures#62779

Merged
gumadeiras merged 14 commits intomainfrom
codex/matrix-channel-lifecycle-hardening
Apr 8, 2026
Merged

fix(matrix): contain sync outage failures#62779
gumadeiras merged 14 commits intomainfrom
codex/matrix-channel-lifecycle-hardening

Conversation

@gumadeiras
Copy link
Copy Markdown
Member

Summary

  • Problem: Matrix startup reported success before sync was actually ready, and detached Matrix monitor tasks could reject without an owner.
  • Why it matters: a homeserver outage could escalate from a channel-scoped failure into a process-wide unhandled rejection crash loop, bypassing gateway.channelMaxRestartsPerHour.
  • What changed: Matrix startup now waits for ready sync states, monitor status/fatal sync handling is owned inside the Matrix plugin, and detached monitor work is centrally contained and drained on shutdown.
  • What did NOT change (scope boundary): no core Matrix special-casing in gateway orchestration, no new config surface, no global unhandled-rejection policy change.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: the Matrix plugin treated matrix-js-sdk startup as ready too early, then left background room-message/verification tasks detached from channel lifecycle ownership.
  • Missing detection / guardrail: Matrix sync fatality and detached task rejection never fed back into the Matrix channel task, so global unhandled rejection policy killed the whole gateway before channel restart budgeting could apply.
  • Contributing context (if known): matrix-js-sdk emits long-lived sync state transitions after startClient() returns, and replayed/in-flight events during outage windows made orphaned async failures much easier to hit.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/matrix/src/matrix/sdk.test.ts, extensions/matrix/src/matrix/monitor/index.test.ts, extensions/matrix/src/matrix/monitor/sync-lifecycle.test.ts
  • Scenario the test should lock in: startup does not resolve before ready sync, startup times out/fails on sync fatal, detached monitor task failures do not escape as unhandled rejections, and fatal sync errors reject the Matrix channel task.
  • Why this is the smallest reliable guardrail: the bug lives at the Matrix SDK/monitor seam, below a full gateway e2e but above pure helper-local logic.
  • Existing test that already covers this (if any): none sufficiently covered the startup-readiness or detached-task ownership seam.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Matrix startup now waits for real sync readiness before the channel is considered started.
  • Fatal Matrix sync failures now stop/restart the Matrix channel instead of crashing the whole gateway process.
  • Matrix channel runtime status now reflects starting, healthy, error, and stopped transitions more accurately.

Diagram (if applicable)

Before:
[matrix startClient returns] -> [Matrix marked started] -> [background task rejects] -> [global unhandled rejection] -> [gateway exits]

After:
[matrix startClient returns] -> [wait for ready sync] -> [background task or sync fatal] -> [Matrix channel task rejects] -> [gateway channel restart policy]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 24 / local repo workspace
  • Model/provider: N/A
  • Integration/channel (if any): Matrix
  • Relevant config (redacted): Matrix account with unreachable homeserver or fatal sync failure after startup

Steps

  1. Configure Matrix and start the gateway with an unreachable or failing homeserver.
  2. Let Matrix startup or replayed inbound event handling hit a sync/background-task failure.
  3. Observe whether the whole gateway exits or only the Matrix channel fails.

Expected

  • Matrix stays channel-scoped: startup waits for readiness, fatal sync failures reject the Matrix channel task, and gateway restart budgeting applies at the channel layer.

Actual

  • Before this change the gateway could exit from unhandled Matrix monitor rejections, bypassing channel restart controls.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: targeted Matrix tests for startup readiness, startup timeout, unexpected sync fatal, detached room-message failure containment, and intentional shutdown STOPPED handling; local pnpm build.
  • Edge cases checked: fatal sync error after startup, detached task rejection sink, intentional shutdown not misclassified as fatal, startup timeout branch with fake timers.
  • What you did not verify: full repo pnpm check remains blocked by unrelated preexisting tsgo failures in extensions/msteams/src/attachments.graph.test.ts, src/agents/subagent-registry.test.ts, and src/infra/host-env-security.test.ts.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: Matrix sync.state = ERROR remains SDK-owned reconnect behavior and is not automatically escalated into channel restart.
    • Mitigation: this PR only escalates clear fatal paths (sync.unexpected_error, unexpected STOPPED, startup readiness failure) to avoid fighting the SDK on transient reconnects.

Copilot AI review requested due to automatic review settings April 8, 2026 00:22
@openclaw-barnacle openclaw-barnacle Bot added channel: matrix Channel integration: matrix size: L maintainer Maintainer-authored PR labels Apr 8, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69a8b95c3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/sdk.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Matrix integration against homeserver sync outages by making Matrix startup wait for actual sync readiness and by ensuring monitor/background work is owned and contained within the Matrix channel lifecycle (preventing process-wide unhandled rejections).

Changes:

  • Add Matrix sync-state typing/helpers and wire Matrix sync lifecycle events (sync.state, sync.unexpected_error) through the MatrixClient bridge.
  • Make MatrixClient.start() wait for initial ready sync states (with timeout / fatal handling) before reporting startup success.
  • Introduce a centralized monitor task runner + sync lifecycle/status controllers so detached monitor work is tracked, drained on shutdown, and fatal sync errors fail the channel task (not the whole process).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
extensions/matrix/src/matrix/sync-state.ts Defines sync state type + readiness/disconnected/terminal helpers.
extensions/matrix/src/matrix/sdk/types.ts Extends Matrix client event map with sync lifecycle events.
extensions/matrix/src/matrix/sdk.ts Waits for initial sync readiness; bridges SDK sync events; tracks current sync state.
extensions/matrix/src/matrix/sdk.test.ts Adds tests for startup readiness gating, unexpected sync errors, and startup timeout.
extensions/matrix/src/matrix/monitor/task-runner.ts Adds tracked detached-task runner + idle draining for shutdown containment.
extensions/matrix/src/matrix/monitor/sync-lifecycle.ts Adds sync fatality ownership to fail the channel task on unexpected STOPPED / sync fatal events.
extensions/matrix/src/matrix/monitor/sync-lifecycle.test.ts Tests fatal sync handling and intentional shutdown STOPPED handling.
extensions/matrix/src/matrix/monitor/status.ts Adds Matrix monitor status controller to publish starting/healthy/error/stopped snapshots.
extensions/matrix/src/matrix/monitor/index.ts Wires status + lifecycle + task runner into the Matrix monitor and abort/fatal handling.
extensions/matrix/src/matrix/monitor/index.test.ts Adds tests for status publishing, detached-task rejection containment, and fatal sync propagation.
extensions/matrix/src/matrix/monitor/events.ts Wraps key event handlers in contained tasks to prevent unhandled rejections.
extensions/matrix/src/channel.ts Plumbs channel lifecycle status sink (ctx.setStatus) into Matrix monitor.
CHANGELOG.md Documents the Matrix/gateway outage containment fix.
Comments suppressed due to low confidence (1)

extensions/matrix/src/matrix/sdk.ts:468

  • startSyncSession() calls this.client.startClient() and then can reject from waitForInitialSyncReady() (timeout / unexpected error / terminal state). On those failure paths, the underlying matrix-js-sdk client remains started and will keep its sync loop running, while MatrixClient.started stays false, so subsequent retries may attempt a second startClient() on an already-running client.

Consider wrapping the post-startClient() startup phase in a try/catch (or try/finally) that stops the SDK client (e.g., via stopSyncWithoutPersist() / client.stopClient()) and resets any related state before rethrowing, so a failed startup does not leak background work or leave the instance in a half-started state.

  private async startSyncSession(opts: { bootstrapCrypto: boolean }): Promise<void> {
    if (this.started) {
      return;
    }

    await this.ensureCryptoSupportInitialized();
    this.registerBridge();
    await this.initializeCryptoIfNeeded();

    await this.client.startClient({
      initialSyncLimit: this.initialSyncLimit,
    });
    await this.waitForInitialSyncReady();
    if (opts.bootstrapCrypto && this.autoBootstrapCrypto) {
      await this.bootstrapCryptoIfNeeded();
    }
    this.started = true;
    this.emitOutstandingInviteEvents();
    await this.refreshDmCache().catch(noop);

Comment thread extensions/matrix/src/matrix/sdk.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

This PR contains the Matrix channel startup/reliability fix: MatrixClient.start() now blocks until the SDK reaches a ready sync state (PREPARED/SYNCING/CATCHUP) with a 30-second timeout, fatal sync errors are routed through the channel's own lifecycle (rejecting its task so gateway restart budgeting applies), and detached background handler failures are tracked and drained on shutdown instead of leaking as unhandled rejections.

The three new modules — task-runner.ts, sync-lifecycle.ts, and status.ts — carve out clean, testable seams, and the added test coverage in sdk.test.ts, monitor/index.test.ts, and sync-lifecycle.test.ts directly locks in the startup-readiness, fatal escalation, and rejection-containment invariants.

Confidence Score: 5/5

Safe to merge. The fix is logically correct, well-tested, and scoped entirely to the Matrix extension.

All remaining findings are P2 (documentation/style). The core correctness of the bug fix is sound: startup readiness gating, fatal-error routing through channel lifecycle, and background-task containment all behave correctly. The test coverage directly locks in the key invariants (startup timeout, fatal rejection, detached-task containment, intentional-shutdown STOPPED classification).

No files require special attention.

Vulnerabilities

No security concerns identified. No new network calls, secrets handling changes, or permission surface changes were introduced.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/matrix/src/matrix/monitor/sync-lifecycle.ts
Line: 56-63

Comment:
**`waitForFatalStop` assumes single-caller ownership**

The resolve/reject callbacks are stored in module-level `let` slots, so a second concurrent call to `waitForFatalStop()` would silently overwrite the first caller's references — leaving its promise forever pending. This isn't a live bug (the only call site is the `Promise.race` in `index.ts`), but the invariant is implicit. A short guard comment or an early-throw would make the constraint explicit for future callers.

```suggestion
    async waitForFatalStop(): Promise<void> {
      if (fatalError) {
        throw fatalError;
      }
      // NOTE: only one concurrent caller is supported; a second call would
      // overwrite these slots and the first promise would never settle.
      await new Promise<void>((resolve, reject) => {
        resolveFatalWait = resolve;
        rejectFatalWait = (error) => reject(error);
      });
    },
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/matrix/src/matrix/monitor/status.ts
Line: 93-95

Comment:
**Unknown sync states don't update `connected`**

The fallback branch (for future/unknown SDK states that aren't `PREPARED`/`SYNCING`/`CATCHUP`/`RECONNECTING`/`ERROR`/`STOPPED`) updates `healthState` and `lastEventAt` but leaves `status.connected` unchanged. If the SDK ever introduces a state that arrives while `connected === true`, the status would report a connected client in an unknown health state. A brief comment clarifying this is intentional would help readers avoid adding `status.connected = false` here by mistake.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(matrix): contain sync outage failure..." | Re-trigger Greptile

Comment thread extensions/matrix/src/matrix/monitor/sync-lifecycle.ts
Comment thread extensions/matrix/src/matrix/monitor/status.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd310b208d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/monitor/sync-lifecycle.ts Outdated
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from dd310b2 to 657e1ce Compare April 8, 2026 03:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 657e1ce06a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/sdk.ts
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from 657e1ce to 9cd2677 Compare April 8, 2026 03:11
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9cd2677d7a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread CHANGELOG.md Outdated
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from 9cd2677 to d0bce95 Compare April 8, 2026 04:18
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from d3ed4da to 843a52a Compare April 8, 2026 04:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 843a52a7c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/monitor/sync-lifecycle.ts Outdated
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from 97e4a7d to 6b4662e Compare April 8, 2026 05:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b4662e7f2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/monitor/sync-lifecycle.ts
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from 274ea16 to 382e4e6 Compare April 8, 2026 06:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 382e4e667a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/client/shared.ts Outdated
Comment thread extensions/matrix/src/matrix/monitor/sync-lifecycle.ts Outdated
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 8, 2026

🔒 Aisle Security Analysis

We found 3 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Unbounded background task tracking enables memory/CPU exhaustion and can block shutdown
2 🟡 Medium Matrix startup abort misclassification via generic AbortError name check
3 🟡 Medium MatrixClient.start() abort/timeout can leave Matrix sync running in background
1. 🟡 Unbounded background task tracking enables memory/CPU exhaustion and can block shutdown
Property Value
Severity Medium
CWE CWE-400
Location extensions/matrix/src/matrix/monitor/task-runner.ts:7-31

Description

createMatrixMonitorTaskRunner tracks every detached handler promise in an in-memory Set until it settles. Event handlers for high-volume Matrix events (room.message, Reaction, verification.summary) are now executed via runDetachedTask, so an attacker who can generate many events (e.g., spamming messages/reactions) can:

  • Create an unbounded number of concurrently in-flight tasks (no concurrency limit/backpressure)
  • Cause sustained memory growth from the growing Set (and promise closures) when handlers are slow
  • Potentially block shutdown indefinitely because waitForIdle() loops until all tracked promises settle; if any handler hangs (network call stuck, deadlock), cleanup will never finish when mode === "persist"

Vulnerable code:

const inFlight = new Set<Promise<void>>();
...
inFlight.add(trackedTask);
...
while (inFlight.size > 0) {
  await Promise.allSettled(Array.from(inFlight));
}

Recommendation

Introduce backpressure and bounded concurrency, and ensure tasks cannot hang forever.

Options (can be combined):

  1. Concurrency limit / queue (preferred):
import pLimit from "p-limit";

const limit = pLimit(25); // tune
const runDetachedTask = (label: string, task: () => Promise<void>) => {
  const trackedTask = limit(async () => {
    await task();
  })
  .catch((err) => { /* log */ })
  .finally(() => inFlight.delete(trackedTask));

  inFlight.add(trackedTask);
  return trackedTask;
};
  1. Timeout + abort propagation: wrap tasks with an AbortSignal (or Promise.race timeout) so stuck handlers eventually settle.

  2. Bound the tracked set: if inFlight.size exceeds a safe threshold, drop/skip low-priority work or coalesce events (e.g., dedupe per room).

  3. In shutdown, consider a max wait for waitForIdle() and then proceed with releaseSharedClientInstance(client, "stop").

2. 🟡 Matrix startup abort misclassification via generic `AbortError` name check
Property Value
Severity Medium
CWE CWE-697
Location extensions/matrix/src/matrix/startup-abort.ts:13-15

Description

isMatrixStartupAbortError treats any Error with name === "AbortError" as an intentional Matrix startup abort.

  • createMatrixStartupAbortError() creates an Error and sets name = "AbortError"
  • isMatrixStartupAbortError() then matches solely on error.name === "AbortError"
  • Many underlying libraries (e.g., fetch/Matrix SDK request layer, undici) may throw abort-related errors with the same name for non-user-initiated reasons (timeouts, internal cancellations, transient network conditions)
  • This causes control-flow changes:
    • In runMatrixStartupMaintenance, such errors are rethrown instead of being logged as non-fatal, potentially aborting startup.
    • In monitorMatrixProvider, if the outer abort signal is already aborted, any AbortError will be treated as a startup abort and the function will cleanup("stop") and return (skipping the normal error propagation path).

This is an error-class confusion issue that can lead to unintended early termination (availability impact) and altered cleanup semantics based on an error name that is not unique to the intended abort condition.

Vulnerable code:

export function isMatrixStartupAbortError(error: unknown): boolean {
  return error instanceof Error && error.name === "AbortError";
}

Recommendation

Use a distinct error type or marker that cannot be confused with generic abort errors from other libraries.

Option A (preferred): custom class + instanceof

export class MatrixStartupAbortError extends Error {
  constructor() {
    super("Matrix startup aborted");
    this.name = "MatrixStartupAbortError";
  }
}

export function createMatrixStartupAbortError(): Error {
  return new MatrixStartupAbortError();
}

export function isMatrixStartupAbortError(err: unknown): err is MatrixStartupAbortError {
  return err instanceof MatrixStartupAbortError;
}

Option B: add a non-enumerable symbol marker

const kMatrixStartupAbort = Symbol.for("openclaw.matrix.startup_abort");

export function createMatrixStartupAbortError(): Error {
  const e = new Error("Matrix startup aborted");
  (e as any)[kMatrixStartupAbort] = true;
  return e;
}

export function isMatrixStartupAbortError(err: unknown): boolean {
  return err instanceof Error && (err as any)[kMatrixStartupAbort] === true;
}

Then keep handling generic AbortError from fetch/SDK as ordinary errors (logged or retried) unless they carry the explicit marker/type.

3. 🟡 MatrixClient.start() abort/timeout can leave Matrix sync running in background
Property Value
Severity Medium
CWE CWE-400
Location extensions/matrix/src/matrix/sdk.ts:487-493

Description

MatrixClient.startSyncSession() starts the underlying matrix-js-sdk client (startClient()), then waits for a ready sync state. If the wait rejects (abort signal, timeout, or unexpected sync error), the function throws without stopping the underlying Matrix client, leaving a live /sync loop running.

This can cause:

  • Unexpected background network activity (continued syncing) after the caller believes startup failed
  • Processing/decryption of events without the intended higher-level application state/handlers being fully initialized
  • Potential resource exhaustion over time (leaked timers/requests), especially if callers retry startup repeatedly

Vulnerable flow:

  • sink: this.client.startClient(...) begins syncing
  • error paths: waitForInitialSyncReady(...) can reject on abort/timeout/unexpected error
  • missing cleanup: no stopClient()/stopSyncWithoutPersist() in these failure paths

Vulnerable code:

await this.client.startClient({ initialSyncLimit: this.initialSyncLimit });
await this.waitForInitialSyncReady({ abortSignal: opts.abortSignal, timeoutMs: opts.readyTimeoutMs });

If waitForInitialSyncReady rejects, the sync loop can continue running.

Recommendation

Ensure the underlying matrix-js-sdk client is stopped if startup fails after startClient().

For example, wrap the post-startClient() startup sequence in try/catch, and stop the client on any failure:

await this.client.startClient({ initialSyncLimit: this.initialSyncLimit });
try {
  await this.waitForInitialSyncReady({ abortSignal: opts.abortSignal, timeoutMs: opts.readyTimeoutMs });
  throwIfMatrixStartupAborted(opts.abortSignal);

  if (opts.bootstrapCrypto && this.autoBootstrapCrypto) {
    await this.bootstrapCryptoIfNeeded(opts.abortSignal);
  }
  throwIfMatrixStartupAborted(opts.abortSignal);

  this.started = true;
  this.emitOutstandingInviteEvents();
  await this.refreshDmCache().catch(noop);
} catch (e) {// stop any background sync started by startClient()
  this.stopSyncWithoutPersist();
  throw e;
}

This guarantees abort/timeout/unexpected errors during startup do not leave a running sync session behind.


Analyzed PR: #62779 at commit 901bb76

Last updated on: 2026-04-08T18:43:14Z

@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from 21250c5 to 27f9d85 Compare April 8, 2026 17:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27f9d850c1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/monitor/sync-lifecycle.ts
Comment thread extensions/matrix/src/matrix/monitor/index.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 795ef740b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/sdk.ts
Comment thread extensions/matrix/src/matrix/monitor/index.ts
@gumadeiras gumadeiras force-pushed the codex/matrix-channel-lifecycle-hardening branch from 795ef74 to 901bb76 Compare April 8, 2026 18:41
@gumadeiras gumadeiras merged commit 0c00c3c into main Apr 8, 2026
8 checks passed
@gumadeiras gumadeiras deleted the codex/matrix-channel-lifecycle-hardening branch April 8, 2026 18:41
@gumadeiras
Copy link
Copy Markdown
Member Author

Merged via squash.

Thanks @gumadeiras!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 901bb767b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/matrix/src/matrix/monitor/index.ts
eleqtrizit pushed a commit that referenced this pull request Apr 8, 2026
Merged via squash.

Prepared head SHA: 901bb76
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
greidron added a commit to greidron/openclaw that referenced this pull request Apr 10, 2026
* release: mirror bundled channel deps at root (openclaw#63065)

Merged via squash.

Prepared head SHA: ac26799
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Reviewed-by: @scoootscooob

* fix(test): keep warn log capture under openclaw temp dir

* revert: undo background alive review findings fix

* feat: add qa character vibes eval

* test: stabilize plugin boundary invariants

* test: isolate agent gateway cli command mocks

* test: skip duplicate package boundary wrapper in ci

* test: fix postpublish verifier sidecar handling

* test: keep status tests off live usage probes

* auto-reply: type status auth overrides

* plugins: read contract inventory from manifests

* test: inline cli metadata channel fixture

* ci: skip duplicate full extension shard

* test: isolate discord directory live token env

* test: keep followup runner memory mock complete

* ci: split parallel full suite into leaf shards

* test: guard loader fixtures against broad sdk imports

* test: keep bundled channel entry smokes descriptor-only

* ci: reduce full suite test parallelism

* test: avoid bundled test api smokes in matrix and telegram

* test: keep discord and irc entry smokes descriptor-only

* test: keep web provider artifact coverage manifest-only

* test: keep provider policy artifact coverage narrow

* test: keep web provider artifact test in boundary

* test: keep status message tests off auth auto-detection

* status: avoid plugin lookup for direct channel model overrides

* channels: fast-path direct model override matches

* test: restore manifest-only web provider coverage

* fix: allow blank TLS manual port default (openclaw#63134) (thanks @Tyler-RNG)

* make port optional for TLS manual connections

* fix: restrict manual blank-port fallback to tls

* fix: allow blank TLS manual port default (openclaw#63134) (thanks @Tyler-RNG)

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>

* test: fix full suite CI test isolation

* fix: align LLM idle timeout policy

* test: exercise models json file mode without provider discovery

* test: keep shared dm policy contract off channel facades

* test: keep web provider artifact test in boundary

* test: keep kilocode provider tests on plugin-owned helpers

* ci: restore sequential full suite tests

* test: keep public artifact coverage on cheap boundaries

* test: keep openclaw tools registration tests on a fast shell

* test: keep bundled metadata sidecar scan inventory-only

* docs(inferrs): fix Gemma model id from gg-hf-gg to google (openclaw#62586)

* fix: harden bundled plugin dependency release checks

* ci: isolate full suite leaf shards

* test: keep openclaw tools registration policy pure

* fix: support Codex CLI QA auth

* feat: add QA character eval reports

* docs: document QA character eval workflow

* refactor: dedupe media generation tool helpers

* refactor: dedupe internal helper glue

* refactor: dedupe shared helper branches

* refactor: dedupe browser navigation guard tests

* refactor: dedupe config and subagent tests

* refactor: dedupe test helpers and script warning filter

* refactor: dedupe plugin test harnesses

* refactor: dedupe media runtime test mocks

* refactor: dedupe plugin metadata test helpers

* refactor: dedupe firecrawl and directive helpers

* refactor: dedupe exec defaults tests

* refactor: dedupe approval runtime tests

* refactor: dedupe matrix exec approval tests

* refactor: dedupe telegram exec approval tests

* refactor: dedupe doctor codex oauth tests

* refactor: dedupe agent command test fixtures

* refactor: dedupe embedding provider test fixtures

* refactor: share html entity tool call decoding

* fix: keep minimax provider mocks package-local

* test: keep pdf and update-plan registration tests pure

* test: keep model reasoning override coverage on merge helpers

* fix: default OpenAI reasoning effort to high

* test: keep kimi implicit provider tests on provider catalog

* fix(build): prune stale bundled plugin node_modules

* fix(build): address bundled plugin prune review

* fix(build): honor postinstall disable flag

* test: keep chutes implicit provider tests on provider catalog

* fix(plugin-sdk): export channel plugin base

* docs: reorder changelog entries

* test: keep bundled web-search owner checks on public artifacts

* fix(build): keep tsdown prune best-effort

* test: trust gateway exec fixture node path

* fix: keep runtime task test harness behind task seams

* test: explain gateway exec fixture trust

* Reply: surface OAuth reauth failures (openclaw#63217)

Merged via squash.

Prepared head SHA: 68b7ffd
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky

* test: make character eval scenario natural

* feat: add character eval model options

* test: keep pi fs workspace tests on fs tool factories

* test: keep media runtime tests on same-directory provider mocks

* fix(android): auto-resume pairing approval

* fix(android): prefer bootstrap auth on qr pairing

* fix(android): reset auth on new setup codes

* fix(android): tighten pairing retry behavior

* fix(android): prefer stored device auth after pairing

* fix: restore android qr pairing flow (openclaw#63199)

* fix(auto-reply): strip leading NO_REPLY tokens to prevent silent-reply leak (openclaw#63068)

* fix(auto-reply): strip leading NO_REPLY tokens to prevent silent-reply leak

* fix(auto-reply): preserve substantive NO_REPLY leading text

* fix(agents): preserve ACP silent-prefix cumulative deltas

* fix(auto-reply): harden silent-token streaming paths

* fix(auto-reply): normalize glued silent tokens consistently

---------

Co-authored-by: termtek <termtek@ubuntu.tail2b72cd.ts.net>

* fix(gateway): clear auto-fallback model override on session reset (openclaw#63155)

* fix(gateway): clear auto-fallback model override on session reset

When `persistFallbackCandidateSelection()` writes a fallback provider
override with `authProfileOverrideSource: "auto"`, the override was
incorrectly preserved across `/reset` and `/new` commands. This caused
sessions to keep using the fallback provider even after the user changed
the agent config primary provider, because the session store override
takes precedence over the config default.

Now the override fields (`providerOverride`, `modelOverride`,
`authProfileOverride`, `authProfileOverrideSource`,
`authProfileOverrideCompactionCount`) are only carried forward when
`authProfileOverrideSource === "user"` (i.e. explicit `/model` command).
System-driven overrides are dropped on reset so the session picks up the
current config default.

Introduced in cb0a752 ("fix: preserve reset session behavior config")

* fix(gateway): preserve explicit reset model selection

* fix(gateway): track reset model override source

* fix(gateway): preserve legacy reset model overrides

* docs(changelog): add session reset merge note

---------

Co-authored-by: termtek <termtek@ubuntu.tail2b72cd.ts.net>

* test: stabilize ci test isolation

* test: isolate volcengine byteplus auth resolver imports

* fix: patch hono security advisories

* fix: pass system prompt to codex cli

* fix(plugins): prevent untrusted workspace plugins from hijacking bundled provider auth choices [AI] (openclaw#62368)

* fix: address issue

* fix: address review feedback

* docs(changelog): add onboarding auth-choice guard entry

* fix: address PR review feedback

* fix: address PR review feedback

* fix: address PR review feedback

* fix: address PR review feedback

* fix: address PR review feedback

* fix: address PR review feedback

* fix: address PR review feedback

* fix: address PR review feedback

---------

Co-authored-by: Devin Robison <drobison@nvidia.com>

* test: isolate provider runtime test mocks

* feat(plugins): support provider auth aliases

* feat(memory): add grounded REM backfill lane (openclaw#63273)

Merged via squash.

Prepared head SHA: 4450f25
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky

* feat(memory): harden grounded REM extraction (openclaw#63297)

Merged via squash.

Prepared head SHA: e188b7e
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky

* feat(ui): add dreaming diary controls and navigation (openclaw#63298)

Merged via squash.

Prepared head SHA: 0a2ae66
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky

* chore(ui): refresh zh-TW control ui locale

* chore(ui): refresh zh-CN control ui locale

* chore(ui): refresh pt-BR control ui locale

* chore(ui): refresh de control ui locale

* chore(ui): refresh es control ui locale

* chore(ui): refresh ko control ui locale

* chore(ui): refresh ja-JP control ui locale

* chore(ui): refresh fr control ui locale

* docs(matrix): tighten setup and config guidance

* chore(ui): refresh tr control ui locale

* chore(ui): refresh uk control ui locale

* chore(ui): refresh pl control ui locale

* chore(ui): refresh id control ui locale

* test: stabilize full-suite execution

* fix(matrix): contain sync outage failures (openclaw#62779)

Merged via squash.

Prepared head SHA: 901bb76
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras

* Align remote node exec event system messages with untrusted handling (openclaw#62659)

* fix(nodes): downgrade remote exec system events

* docs(changelog): add remote node exec event entry

---------

Co-authored-by: Devin Robison <drobison@nvidia.com>

* test: reuse image generate tool imports

* test: reuse followup runner imports

* docs(config): tighten wording in reference

* test: harden provider mock isolation

* fix(memory): accept embedded dreaming heartbeat tokens

* test: harden Parallels macOS smoke fallback

* build: narrow plugin SDK declaration build

* fix(dotenv): block workspace runtime env vars (openclaw#62660)

* fix(dotenv): block workspace runtime env vars

Co-authored-by: zsx <git@zsxsoft.com>

* docs(changelog): add workspace dotenv runtime-control entry

* fix(dotenv): block workspace gateway port override

---------

Co-authored-by: zsx <git@zsxsoft.com>
Co-authored-by: Devin Robison <drobison@nvidia.com>

* build: stage nostr runtime dependencies

* fix: load QA live provider overrides

* feat: parallelize character eval runs

* auth: avoid external cli sync on profile upsert

* test(doctor): mock memory-core runtime seam

* auth: persist explicit profile upserts directly

* Matrix: report startup failures as errors

* fix(browser): harden browser control override loading (openclaw#62663)

* fix(browser): harden browser control overrides

* fix(lint): prepare boundary artifacts for extension oxlint

* docs(changelog): add browser override hardening entry

* fix(lint): avoid duplicate boundary prep

---------

Co-authored-by: Devin Robison <drobison@nvidia.com>
Co-authored-by: Devin Robison <drobison00@users.noreply.github.com>

* test: reuse exec directive reply imports

* test: reuse verbose directive reply imports

* fix(browser): re-check interaction-driven navigations (openclaw#63226)

* fix(browser): guard interaction-driven navigations

* fix(browser): avoid rechecking unchanged interaction urls

* fix(browser): guard delayed interaction navigations

* fix(browser): guard interaction-driven navigations for full action duration

* fix(browser): avoid waiting on interaction grace timer

* fix(browser): ignore same-document hash-only URL changes in navigation guard

* fix(browser): dedupe interaction nav guards

* fix(browser): guard same-URL reloads in interaction navigation listeners

* docs(changelog): add interaction navigation guard entry

* fix(browser): drop duplicate ssrfPolicy props

* fix(browser): tighten interaction navigation guards

---------

Co-authored-by: Devin Robison <drobison@nvidia.com>

* test: move directive state coverage to pure tests

* fix: enable thinking support for the ollama api (openclaw#62712)

Merged via squash.

Prepared head SHA: c0b9950
Co-authored-by: hoyyeva <63033505+hoyyeva@users.noreply.github.com>
Co-authored-by: BruceMacD <5853428+BruceMacD@users.noreply.github.com>
Reviewed-by: @BruceMacD

* Slack: treat ACP block text as visible output (openclaw#62858)

Merged via squash.

Prepared head SHA: 14f202e
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras

* fix: fail fast on qa live auth errors

* fix: fail fast across qa scenario wait paths

* test: cover qa scenario wait failure replies

* fix: sanitize qa missing-key replies

* test: cover sanitized qa missing-key replies

* fix: align qa wait cursor semantics

* test: cover mixed-traffic qa wait cursors

* fix: classify curated qa missing-key replies

* test: cover curated qa missing-key reply classification

* fix: harden qa missing-key provider messages

* test: cover unsafe qa missing-key providers

* docs(changelog): add qa auth fail-fast entry (openclaw#63333) (thanks @shakkernerd)

* fix(matrix/doctor): migrate legacy channels.matrix.dm.policy 'trusted' (fixes openclaw#62931) (openclaw#62942)

Merged via squash.

Prepared head SHA: d9f553b
Co-authored-by: lukeboyett <46942646+lukeboyett@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras

* Memory/dreaming: feed grounded backfill into short-term promotion (openclaw#63370)

Merged via squash.

Prepared head SHA: 5dfe246
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky

* docs: update unreleased changelog

* fix(gateway): classify dream diary actions

* fix(memory): align dreaming status payloads

* Memory/dreaming: harden grounded backfill follow-ups

* test: reuse inline directive reply imports

* Docs/memory: explain grounded backfill flows

* fix(deps): patch basic-ftp advisory

* test: move inline directive collisions to pure tests

* Slack: dedupe partial streaming replies (openclaw#62859)

Merged via squash.

Prepared head SHA: cbecb50
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras

* test: replace exec directive e2e with pure coverage

* fix(plugins): keep test helpers out of contract barrels (openclaw#63311)

Merged via squash.

Prepared head SHA: 769e90c
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf

* test: move cron heartbeat delivery coverage below full turns

* fix: inter-session messages must not overwrite established external lastRoute (openclaw#58013)

Merged via squash.

Prepared head SHA: 820ea20
Co-authored-by: duqaXxX <12242811+duqaXxX@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman

* fix(gateway): suppress announce/reply skip chat leakage (openclaw#51739)

Merged via squash.

Prepared head SHA: 2f53f3b
Co-authored-by: Pinghuachiu <9033138+Pinghuachiu@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman

* Slack: key turn-local dedupe by dispatch kind

Scope Slack turn-local delivery dedupe by reply dispatch kind so identical tool and final payloads on the same thread do not collapse into one send.

Expose the existing dispatcher kind on the public reply-runtime seam and cover the Slack tracker and preview-fallback paths with regression tests.

* Dreaming: surface grounded scene lane (openclaw#63395)

Merged via squash.

Prepared head SHA: 0c7f586
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky

* test: avoid runtime auth overlays in failure-state coverage

* fix(ci): align ollama thinking expectations

* chore(ui): refresh zh-CN control ui locale

* chore(ui): refresh pt-BR control ui locale

* chore(ui): refresh zh-TW control ui locale

* chore(ui): refresh de control ui locale

* test(docker): reduce e2e log noise

* chore(ui): refresh es control ui locale

* chore(ui): refresh fr control ui locale

* chore(ui): refresh ja-JP control ui locale

* chore(ui): refresh ko control ui locale

* chore(ui): refresh uk control ui locale

* chore(ui): refresh id control ui locale

* chore(ui): refresh pl control ui locale

* chore(ui): refresh tr control ui locale

* fix: restore main ci

* fix(ci): drop silent history before truncation

* docs: reorder unreleased changelog

* test(docker): quiet success-path e2e logs

* style: sort session import

* build: mirror bundled plugin runtime deps

* plugins: load lightweight provider discovery entries

* ci: narrow Windows node test lane

* fix: filter provider auth aliases by plugin trust

* fix: surface delayed browser navigation blocks

* style: format memory and gateway touchups

* Delete docs/plans directory

Unused artifact

* test: avoid remote ollama timeout in api-key preservation coverage

* test: keep auth-choice default-model coverage on lightweight provider

* test: keep undefined-token auth-choice coverage generic

* fix: stabilize character eval and Qwen model routing

* test: keep agent command tests off external auth overlays

* fix openrouter model picker refs (openclaw#63416)

* fix openrouter model picker refs

Signed-off-by: sallyom <somalley@redhat.com>

* test(ui): cover openrouter slash-id /model resolution

---------

Signed-off-by: sallyom <somalley@redhat.com>
Co-authored-by: Vignesh Natarajan <vignesh.natarajan92@gmail.com>

* ci: stabilize macOS and transcript policy tests

* test: keep cli-provider agent command tests off external auth overlays

* chore(lint): clear extension lint regressions and add openclaw#63416 changelog

* test: update modelstudio catalog contract sentinel

* test: update character eval public panel

* fix: repair Windows dev-channel updater

* test: move copilot models-json injection coverage to plan tests

* plugin-sdk: split command status surface

* plugin-sdk: keep command status compatibility path light

* plugin-sdk: drop investigative weixin repro harness

* tests: document config mock choice for eager warmup

* fix: update command-status SDK baseline (openclaw#63174) (thanks @hxy91819)

* test: cap broad live model sweeps

* fix: drop raw gateway chat control replies

* test: make shared-token reload deterministic

* test: isolate agentic suite smoke tests

* test: replace models-config matrix with narrow coverage

* test: isolate onboard skills status mock

* plugins: add lightweight anthropic vertex discovery

* test: isolate model auth module state

* test: isolate subagent registry resume imports

* plugins: keep google provider policy lightweight

* test: keep ollama unreachable discovery on localhost

* test: mock auth profile external overlay in oauth tests

* auth: avoid plugin setup scans during common auth resolution

* fix(logging): break console/logger type cycle

* fix(config): stop owner-display barrel cycles

* fix(commands): split auth choice apply types

* fix(infra): extract exec approvals allowlist types

* fix(commands): split doctor prompt option types

* chore: prepare 2026.4.9-beta.1 release

* chore: refresh config schema version for 2026.4.9-beta.1

* chore: refresh plugin SDK API baseline

* test: run local full suite project shards in parallel

* wizard: add explicit skip option to plugin setup (openclaw#63436)

* Wizard: allow skipping plugin setup

* Agents: reset nodes tool test modules

* tests: reset discord native-command seams in model picker (openclaw#63267)

* ci: tolerate noisy npm pack json output

* test: isolate slack thread-ts recovery

* fix(msteams): isolate channel thread sessions by replyToId (openclaw#58615) (openclaw#62713)

* fix(msteams): isolate thread sessions by replyToId (openclaw#58615)

* fix(msteams): align thread ID extraction + fix test types

* fix(msteams): route thread replies to correct thread via replyToId (openclaw#58030) (openclaw#62715)

* fix(msteams): pin reply target at inbound time to prevent DM/channel leak (openclaw#54520) (openclaw#62716)

* test: keep local full suite serial by default

* chore: prepare 2026.4.9 stable release

* Agents: guard legacy pi transport override

* Agents: restore upstream pi runner sources

---------

Signed-off-by: sallyom <somalley@redhat.com>
Co-authored-by: scoootscooob <zhentongfan@gmail.com>
Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Co-authored-by: Nimrod Gutman <nimrod.gutman@gmail.com>
Co-authored-by: Tyler Warburton <Ethan.gold-Steinberg@protonmail.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
Co-authored-by: Eric Curtin <eric.curtin@docker.com>
Co-authored-by: Mariano <mbelinky@gmail.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: Frank Yang <frank.ekn@gmail.com>
Co-authored-by: termtek <termtek@ubuntu.tail2b72cd.ts.net>
Co-authored-by: Pavan Kumar Gondhi <pgondhi@nvidia.com>
Co-authored-by: Devin Robison <drobison@nvidia.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: Agustin Rivera <31522568+eleqtrizit@users.noreply.github.com>
Co-authored-by: zsx <git@zsxsoft.com>
Co-authored-by: Devin Robison <drobison00@users.noreply.github.com>
Co-authored-by: Eva H <63033505+hoyyeva@users.noreply.github.com>
Co-authored-by: BruceMacD <5853428+BruceMacD@users.noreply.github.com>
Co-authored-by: Shakker <shakkerdroid@gmail.com>
Co-authored-by: lukeboyett <46942646+lukeboyett@users.noreply.github.com>
Co-authored-by: Altay <altay@uinaf.dev>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Co-authored-by: Accunza <12242811+duqaXxX@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Co-authored-by: Pinghuachiu <9033138+Pinghuachiu@users.noreply.github.com>
Co-authored-by: Radek Sienkiewicz <mail@velvetshark.com>
Co-authored-by: Sally O'Malley <somalley@redhat.com>
Co-authored-by: Vignesh Natarajan <vignesh.natarajan92@gmail.com>
Co-authored-by: Mason Huang <masonxhuang@tencent.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Co-authored-by: pashpashpash <nik@vault77.ai>
Co-authored-by: sudie-codes <suvenkat95@gmail.com>
zhonghe0615 pushed a commit to zhonghe0615/openclaw that referenced this pull request Apr 27, 2026
Merged via squash.

Prepared head SHA: 901bb76
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
Merged via squash.

Prepared head SHA: 901bb76
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
Merged via squash.

Prepared head SHA: 901bb76
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: matrix Channel integration: matrix maintainer Maintainer-authored PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Matrix provider connection failure causes rapid gateway process crash loop

2 participants