Skip to content

fix: preserve canonical restart sentinel routes#64391

Merged
gumadeiras merged 5 commits into
mainfrom
codex/fix-restart-sentinel-canonical-routing
Apr 10, 2026
Merged

fix: preserve canonical restart sentinel routes#64391
gumadeiras merged 5 commits into
mainfrom
codex/fix-restart-sentinel-canonical-routing

Conversation

@gumadeiras

Copy link
Copy Markdown
Member

Summary

  • Problem: restart-sentinel notices could fall back to reconstructing outbound targets from session keys when canonical delivery context was missing after restart.
  • Why it matters: case-sensitive channels like Matrix can carry lossy/lowercased session keys, so that fallback can misroute the post-restart notice even though the wake event is still valid.
  • What changed: extractDeliveryInfo() now synthesizes canonical stored delivery routes via deliveryContextFromSession(...), and restart-sentinel notice delivery no longer falls back to session-key target reconstruction.
  • What did NOT change (scope boundary): this PR does not change normal outbound delivery-queue recovery behavior; it only hardens the restart-sentinel notice path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: restart-sentinel notice delivery merged sentinel context with stored session context, then fell back to resolveAnnounceTargetFromKey(...) when canonical route data was missing. That fallback reconstructs a destination from session-key identity, which is unsafe for case-sensitive channels.
  • Missing detection / guardrail: no regression covered the "stored route missing, session key lossy" restart path.
  • Contributing context (if known): normal Matrix queue recovery already prefers canonical stored route metadata; the restart-sentinel path had older fallback behavior.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/server-restart-sentinel.test.ts, src/config/sessions/delivery-info.test.ts
  • Scenario the test should lock in: restart-sentinel skips outbound notice when no canonical route survives restart, and stored session metadata still yields the canonical Matrix route when deliveryContext is absent.
  • Why this is the smallest reliable guardrail: the bug lives at the boundary between session metadata recovery and restart-sentinel delivery routing.
  • Existing test that already covers this (if any): src/infra/outbound/delivery-queue.recovery.test.ts already covers the permanent-failure recovery side, but not restart-sentinel fallback.
  • If no new test is added, why not:

User-visible / Behavior Changes

  • Restart notices no longer guess a target from a lossy session key when canonical route metadata is unavailable after restart.
  • When canonical route metadata still exists, restart notices continue routing normally.

Diagram (if applicable)

Before:
[restart sentinel] -> [missing canonical route] -> [session-key fallback target] -> [possible misdelivery]

After:
[restart sentinel] -> [missing canonical route] -> [skip outbound notice] -> [wake/system event still delivered]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local Node 22 + pnpm
  • Model/provider: N/A
  • Integration/channel (if any): Matrix restart notice routing
  • Relevant config (redacted): session store entries with canonical Matrix lastTo / origin.to

Steps

  1. Persist a Matrix session whose session key is lossy/lowercased.
  2. Remove canonical restart delivery context from the sentinel payload.
  3. Trigger restart-sentinel wake handling.

Expected

  • No outbound restart notice is sent from a reconstructed session-key target.
  • Canonical stored route metadata is used when present.

Actual

  • Previously, restart-sentinel could fall back to session-key-derived routing.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: targeted tests for delivery-info, restart-sentinel, Matrix session-route, and delivery-queue recovery/policy.
  • Edge cases checked: Matrix mixed-case canonical route stored without deliveryContext; restart-sentinel with no surviving canonical route; existing permanent Matrix 403 recovery coverage still green.
  • What you did not verify: full end-to-end gateway restart against a live Matrix server.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: skipping outbound restart notice when canonical route metadata is missing can suppress a notice in some stale-session cases.
    • Mitigation: the wake/system event still fires, and we avoid misdelivery to the wrong conversation.

Copilot AI review requested due to automatic review settings April 10, 2026 15:55
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S maintainer Maintainer-authored PR labels Apr 10, 2026
@gumadeiras gumadeiras self-assigned this Apr 10, 2026
@greptile-apps

greptile-apps Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens the restart-sentinel notice path for case-sensitive channels like Matrix by (1) making extractDeliveryInfo() use deliveryContextFromSession() to synthesize canonical routes from lastChannel/lastTo metadata even when the deliveryContext field itself is absent, and (2) removing the resolveAnnounceTargetFromKey fallback in scheduleRestartSentinelWake so that a missing canonical route causes a silent skip rather than misdelivery to a lowercased session-key-derived target. Both new code paths are covered by targeted tests.

Confidence Score: 5/5

Safe to merge — the fix is tightly scoped to the restart-sentinel notice path and is fully covered by new targeted tests.

All four changed files contain only correct, well-tested logic. The sole observation (redundant threadId fallback) is unreachable dead code and carries no behavioral risk. No P0 or P1 issues found.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/config/sessions/delivery-info.ts
Line: 32-33

Comment:
**Redundant threadId fallback**

`deliveryContextFromSession` already synthesizes `threadId` from `lastThreadId ?? deliveryContext?.threadId ?? origin?.threadId` internally, so the `?? entry?.lastThreadId ?? entry?.origin?.threadId` portion of the fallback chain here is unreachable in practice. The behavior is correct but the extra chain adds noise.

```suggestion
        storedDeliveryContext.threadId;
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: preserve canonical restart sentinel..." | Re-trigger Greptile

Comment thread src/config/sessions/delivery-info.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3dcbefdde0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/config/sessions/delivery-info.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens restart-sentinel notice routing to avoid reconstructing outbound targets from lossy session keys (notably for case-sensitive channels like Matrix), and instead relies on canonical stored delivery route metadata when available.

Changes:

  • Remove restart-sentinel fallback routing via resolveAnnounceTargetFromKey(...), skipping outbound notice delivery when canonical route data is unavailable.
  • Update extractDeliveryInfo() to derive delivery context using deliveryContextFromSession(...) (leveraging stored last-route metadata when deliveryContext is missing).
  • Add/adjust tests for restart-sentinel behavior and delivery-info route derivation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/gateway/server-restart-sentinel.ts Stops reconstructing notice targets from session keys; only routes when canonical context is present.
src/gateway/server-restart-sentinel.test.ts Adds coverage for “skip outbound notice when no canonical context”; updates mocks.
src/config/sessions/delivery-info.ts Uses deliveryContextFromSession() to synthesize delivery context from stored session metadata.
src/config/sessions/delivery-info.test.ts Adds a test intended to cover deriving delivery info from stored last-route metadata.

Comment thread src/config/sessions/delivery-info.test.ts
Comment thread src/gateway/server-restart-sentinel.test.ts
@gumadeiras gumadeiras force-pushed the codex/fix-restart-sentinel-canonical-routing branch 2 times, most recently from 16c8fac to be97d69 Compare April 10, 2026 16:18

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1acabb421c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-restart-sentinel.ts
@aisle-research-bot

aisle-research-bot Bot commented Apr 10, 2026

Copy link
Copy Markdown

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Partial restart-sentinel deliveryContext can override canonical route, causing misrouted outbound notice
2 🟡 Medium Restart sentinel JSON file lacks integrity and secure permissions, enabling tampering to trigger outbound messages
1. 🟡 Partial restart-sentinel deliveryContext can override canonical route, causing misrouted outbound notice
Property Value
Severity Medium
CWE CWE-285
Location src/gateway/server-restart-sentinel.ts:154-174

Description

In scheduleRestartSentinelWake, the outbound destination is derived by merging the restart sentinel's payload.deliveryContext (primary) with the session store context (fallback):

  • payload.deliveryContext is loaded from a local JSON file (restart-sentinel.json) and may be missing channel but still contain to/accountId.
  • mergeDeliveryContext(primary, fallback) only prevents mixing when both sides provide different channel values. If the primary omits channel, it can still override to/accountId while inheriting the fallback channel.
  • This allows creation of a valid-but-wrong route (channel from session store + to from sentinel), potentially sending the restart notice to an unintended conversation/recipient.

Vulnerable flow:

  • input: payload.deliveryContext from consumeRestartSentinel() (filesystem JSON)
  • merge: mergeDeliveryContext(sentinelContext, sessionDeliveryContext)
  • sink: deliverOutboundPayloads({ channel, to: resolved.to, accountId: origin?.accountId, ... })

Vulnerable code:

const sentinelContext = payload.deliveryContext;
let sessionDeliveryContext = deliveryContextFromSession(entry);
...
const origin = mergeDeliveryContext(sentinelContext, sessionDeliveryContext);

const channelRaw = origin?.channel;
const channel = channelRaw ? normalizeChannelId(channelRaw) : null;
const to = origin?.to;
if (!channel || !to) {
  return;
}

Recommendation

Ensure the sentinel context is only preferred when it is routable as a pair (has both channel and to), otherwise ignore its route fields entirely.

One safe approach:

const sentinelContext = payload.deliveryContext;
const sessionContext = sessionDeliveryContext;

const origin = hasRoutableDeliveryContext(sentinelContext)
  ? mergeDeliveryContext(sentinelContext, sessionContext)
  : sessionContext;

Alternatively, harden mergeDeliveryContext() to avoid cross-field mixing when the primary has to/accountId/threadId but no channel (treat that as a conflict and prefer fallback fields as a unit).

2. 🟡 Restart sentinel JSON file lacks integrity and secure permissions, enabling tampering to trigger outbound messages
Property Value
Severity Medium
CWE CWE-347
Location src/infra/restart-sentinel.ts:69-72

Description

The restart sentinel mechanism persists routing metadata to a JSON file in the state directory and later consumes it to send a restart notice. The persisted file is written and read without any integrity protection (signature/MAC) and without enforcing restrictive filesystem permissions.

If an attacker can write to the configured state directory (e.g., via a misconfigured OPENCLAW_STATE_DIR, shared home directory, permissive umask, container volume with weak permissions, or another local compromise), they can tamper with restart-sentinel.json to inject an attacker-chosen payload.deliveryContext (channel/to) and cause the gateway to send outbound restart messages to arbitrary destinations (confused-deputy).

Vulnerable behavior:

  • writeRestartSentinel() writes restart-sentinel.json with default permissions.
  • readRestartSentinel()/consumeRestartSentinel() trusts parsed JSON and does not authenticate/validate the payload beyond basic JSON structure.

Vulnerable code:

await fs.mkdir(path.dirname(filePath), { recursive: true });
const data: RestartSentinel = { version: 1, payload };
await fs.writeFile(filePath, `${JSON.stringify(data, null, 2)}\n`, "utf-8");

Recommendation

Harden the sentinel persistence against tampering:

  1. Enforce restrictive permissions on the state dir and the sentinel file:
await fs.mkdir(path.dirname(filePath), { recursive: true, mode: 0o700 });
await fs.writeFile(filePath, content, { encoding: "utf-8", mode: 0o600, flag: "w" });
  1. Add strict schema validation for payload.deliveryContext and payload.sessionKey (e.g., zod/io-ts) and reject unknown/invalid values.

  2. If sentinel files may be stored in locations writable by other principals (shared volumes), add an integrity mechanism (e.g., HMAC over the payload using a secret stored outside the writable directory) and verify it before acting on any routing fields.

  3. Consider ignoring payload.deliveryContext unless it matches (or is consistent with) the session store’s last known channel/to for that sessionKey.


Analyzed PR: #64391 at commit 0183c17

Last updated on: 2026-04-10T16:53:04Z

@gumadeiras gumadeiras force-pushed the codex/fix-restart-sentinel-canonical-routing branch from 126d103 to e5ff872 Compare April 10, 2026 16:42
@gumadeiras gumadeiras force-pushed the codex/fix-restart-sentinel-canonical-routing branch from e5ff872 to 0183c17 Compare April 10, 2026 16:43
@gumadeiras gumadeiras merged commit 9c44f10 into main Apr 10, 2026
9 checks passed
@gumadeiras gumadeiras deleted the codex/fix-restart-sentinel-canonical-routing branch April 10, 2026 16:44
@gumadeiras

Copy link
Copy Markdown
Member Author

Merged via squash.

Thanks @gumadeiras!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0183c1782f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-restart-sentinel.ts
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
Merged via squash.

Prepared head SHA: 0183c17
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
Merged via squash.

Prepared head SHA: 0183c17
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Merged via squash.

Prepared head SHA: 0183c17
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Merged via squash.

Prepared head SHA: 0183c17
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
Merged via squash.

Prepared head SHA: 0183c17
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants