Skip to content

fix(routing): update main session lastChannel/lastTo for internal sources#34603

Open
caiqinghua wants to merge 2 commits intoopenclaw:mainfrom
caiqinghua:fix/34582-tui-messages-routing
Open

fix(routing): update main session lastChannel/lastTo for internal sources#34603
caiqinghua wants to merge 2 commits intoopenclaw:mainfrom
caiqinghua:fix/34582-tui-messages-routing

Conversation

@caiqinghua
Copy link

Summary

Fixes #34582

When TUI/WebChat sends messages to main session, update lastChannel/lastTo
to internal instead of preserving stale external routes (e.g. WhatsApp).
This prevents replies from routing to wrong channel after TUI input.

Root Cause

Commit 7f2708a (#33786) added special handling for main sessions that returned
early for internal channels in resolveLastChannelRaw/resolveLastToRaw.
While this was intended to preserve external routes, it prevented
lastChannel/lastTo from being updated when internal sources (TUI/WebChat)
sent messages to main session.

Result: when agent replied after TUI input, the reply used stale
lastChannel/lastTo values from prior external channel (WhatsApp) instead
of routing back to TUI.

Changes

Modified src/auto-reply/reply/session-delivery.ts:

  • resolveLastChannelRaw: Added exception for main session + internal channel
    to update lastChannel instead of preserving stale external routes
  • resolveLastToRaw: Added exception for main session + internal channel
    to clear lastTo instead of preserving stale external targets

Verification

  • Test Results: All 55 related tests pass:
    • src/auto-reply/reply/session.test.ts: 42 tests ✓
    • src/gateway/server-methods/chat.directive-tags.test.ts: 13 tests ✓

Risk & Rollback

  • Low Risk: Changes only affect main session routing for internal channels
    (TUI/WebChat). External channel behavior unchanged.
  • Rollback: Revert ce7b3e9 if any regressions in main session routing.

Issue Reference

Closes #34582

caiqinghua and others added 2 commits March 4, 2026 23:24
…34530)

After gateway restart, if a job's nextRunAtMs is more than 24 hours ahead
of its lastRunAtMs, it may have been set incorrectly before shutdown. This
could cause jobs to skip scheduled runs.

Added an additional startup pass that detects and corrects these stale
timestamps by recomputing nextRunAtMs based on the current time. This ensures
jobs resume their correct schedule after restart, fixing the issue where
cron jobs would run once per hour instead of every 10 minutes.

All 481 cron-related tests pass.
…rces

When TUI/WebChat sends messages to main session, update lastChannel/lastTo
to internal instead of preserving stale external routes (e.g. WhatsApp).
This prevents replies from routing to wrong channel after TUI input.

Root cause: commit 7f2708a added special handling for main sessions that
returned early for internal channels, but this prevented lastChannel/lastTo
from being updated, causing replies to leak to stale external routes.

Fix: Allow internal channels to update lastChannel/lastTo for main sessions
so replies route back to the correct internal surface (TUI/WebChat).

Closes openclaw#34582

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR fixes a stale-routing bug where replies after TUI/WebChat input on the main session were incorrectly sent to a prior external channel (e.g. WhatsApp) because resolveLastChannelRaw/resolveLastToRaw were preserving the external fallback instead of updating to the active internal surface. It also adds a startup safety pass in the cron service to correct nextRunAtMs values that recomputeNextRuns intentionally leaves untouched (still-future values that were set incorrectly by a prior bug).

  • session-delivery.ts – The change is a correct structural refactor. The original early-return for INTERNAL_MESSAGE_CHANNEL + main session is replaced with a prioritised branch inside the !isExternalRoutingChannel guard. Because INTERNAL_MESSAGE_CHANNEL can never satisfy isExternalRoutingChannel, the two paths are semantically equivalent, and the fix correctly allows internal sources to overwrite lastChannel/lastTo on the main session.
  • ops.ts – The new startup loop addresses a gap in recomputeNextRuns, which preserves still-future nextRunAtMs values and cannot correct one that was set too far ahead by a prior bug. The isFiniteTimestamp helper is a local duplicate of the private copy in jobs.ts — consider exporting it from jobs.ts to avoid drift. The > 24 h heuristic is safe: for long-interval schedules the loop always enters the branch but computeJobNextRunAtMs returns the same stored value so no update is applied; the inline comment slightly mis-describes the scenario (stale future values, not past-due ones).

Confidence Score: 4/5

  • Safe to merge — routing fix is logically correct and the cron startup change is a no-op for healthy jobs.
  • The session-delivery changes are a clean, semantically equivalent refactor that correctly resolves the stale-routing regression. The cron ops change is a targeted safety net that cannot inadvertently corrupt valid schedules (computeJobNextRunAtMs returns the same value for correctly-set future timestamps, blocking the update). The only deductions are the lack of dedicated tests for the new cron startup logic and the minor isFiniteTimestamp duplication between ops.ts and jobs.ts.
  • No files require special attention beyond the style observations noted in src/cron/service/ops.ts.

Last reviewed commit: d661610

Comment on lines +25 to +27
function isFiniteTimestamp(value: unknown): value is number {
return typeof value === "number" && Number.isFinite(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of private helper in jobs.ts

isFiniteTimestamp is already defined as a private helper in src/cron/service/jobs.ts (line 83–85) with the identical implementation. Since it isn't exported there, a local copy is needed here — but it may be worth exporting the one from jobs.ts and importing it to avoid the duplication and keep the guard logic in sync if it ever changes.

Suggested change
function isFiniteTimestamp(value: unknown): value is number {
return typeof value === "number" && Number.isFinite(value);
}
// Consider exporting isFiniteTimestamp from jobs.ts and importing here
function isFiniteTimestamp(value: unknown): value is number {
return typeof value === "number" && Number.isFinite(value);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 25-27

Comment:
**Duplicate of private helper in `jobs.ts`**

`isFiniteTimestamp` is already defined as a private helper in `src/cron/service/jobs.ts` (line 83–85) with the identical implementation. Since it isn't exported there, a local copy is needed here — but it may be worth exporting the one from `jobs.ts` and importing it to avoid the duplication and keep the guard logic in sync if it ever changes.

```suggestion
// Consider exporting isFiniteTimestamp from jobs.ts and importing here
function isFiniteTimestamp(value: unknown): value is number {
  return typeof value === "number" && Number.isFinite(value);
}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +142 to +155
// If nextRunAtMs is more than a day ahead of lastRunAtMs, it might be stale.
// For example, if the gateway was down for multiple days, nextRunAtMs
// might have been set to a time that's already passed, but the job hasn't
// run yet. Recomputing ensures the job runs at the next valid slot.
if (nextRun - lastRun > 24 * 60 * 60 * 1000) {
const updated = computeJobNextRunAtMs(job, now);
if (isFiniteTimestamp(updated) && updated !== nextRun) {
job.state.nextRunAtMs = updated;
state.deps.log.info(
{ jobId: job.id, oldNextRun: nextRun, newNextRun: updated },
"cron: corrected stale nextRunAtMs after restart",
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heuristic may be misleading for long-interval schedules

The condition nextRun - lastRun > 24 * 60 * 60 * 1000 will always be true for any job whose schedule interval is longer than 24 hours (e.g. a weekly or monthly cron job), causing computeJobNextRunAtMs to be called on every gateway restart for those jobs. In practice this is harmless — computeJobNextRunAtMs is deterministic and will return the same value that is already stored, so updated !== nextRun keeps the guard from applying an incorrect change — but it produces unnecessary log lines and makes the intent of the heuristic harder to reason about.

The comment also says "if the gateway was down for multiple days, nextRunAtMs might have been set to a time that's already passed", but that scenario is already fully handled by runMissedJobs + recomputeNextRuns above. What this loop actually targets is a stale future nextRunAtMs (set too far ahead by a prior bug) that recomputeNextRuns intentionally skips because now < nextRunAtMs. Tightening the comment (and optionally the guard) to reflect this would make the intent clearer:

Suggested change
// If nextRunAtMs is more than a day ahead of lastRunAtMs, it might be stale.
// For example, if the gateway was down for multiple days, nextRunAtMs
// might have been set to a time that's already passed, but the job hasn't
// run yet. Recomputing ensures the job runs at the next valid slot.
if (nextRun - lastRun > 24 * 60 * 60 * 1000) {
const updated = computeJobNextRunAtMs(job, now);
if (isFiniteTimestamp(updated) && updated !== nextRun) {
job.state.nextRunAtMs = updated;
state.deps.log.info(
{ jobId: job.id, oldNextRun: nextRun, newNextRun: updated },
"cron: corrected stale nextRunAtMs after restart",
);
}
}
// Guard: if nextRunAtMs is more than a day past what the current schedule
// would produce from lastRunAtMs, the stored value was likely set incorrectly
// (e.g. by a prior bug) and recomputeNextRuns() won't fix it because it
// preserves still-future values. Recompute so the job fires at the correct slot.
if (nextRun - lastRun > 24 * 60 * 60 * 1000) {
const updated = computeJobNextRunAtMs(job, now);
if (isFiniteTimestamp(updated) && updated !== nextRun) {
job.state.nextRunAtMs = updated;
state.deps.log.info(
{ jobId: job.id, oldNextRun: nextRun, newNextRun: updated },
"cron: corrected stale nextRunAtMs after restart",
);
}
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 142-155

Comment:
**Heuristic may be misleading for long-interval schedules**

The condition `nextRun - lastRun > 24 * 60 * 60 * 1000` will always be true for any job whose schedule interval is longer than 24 hours (e.g. a weekly or monthly cron job), causing `computeJobNextRunAtMs` to be called on every gateway restart for those jobs. In practice this is harmless — `computeJobNextRunAtMs` is deterministic and will return the same value that is already stored, so `updated !== nextRun` keeps the guard from applying an incorrect change — but it produces unnecessary log lines and makes the intent of the heuristic harder to reason about.

The comment also says "if the gateway was down for multiple days, `nextRunAtMs` might have been set to a time that's already passed", but that scenario is already fully handled by `runMissedJobs` + `recomputeNextRuns` above. What this loop actually targets is a stale *future* `nextRunAtMs` (set too far ahead by a prior bug) that `recomputeNextRuns` intentionally skips because `now < nextRunAtMs`. Tightening the comment (and optionally the guard) to reflect this would make the intent clearer:

```suggestion
      // Guard: if nextRunAtMs is more than a day past what the current schedule
      // would produce from lastRunAtMs, the stored value was likely set incorrectly
      // (e.g. by a prior bug) and recomputeNextRuns() won't fix it because it
      // preserves still-future values.  Recompute so the job fires at the correct slot.
      if (nextRun - lastRun > 24 * 60 * 60 * 1000) {
        const updated = computeJobNextRunAtMs(job, now);
        if (isFiniteTimestamp(updated) && updated !== nextRun) {
          job.state.nextRunAtMs = updated;
          state.deps.log.info(
            { jobId: job.id, oldNextRun: nextRun, newNextRun: updated },
            "cron: corrected stale nextRunAtMs after restart",
          );
        }
      }
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d661610201

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +146 to +149
if (nextRun - lastRun > 24 * 60 * 60 * 1000) {
const updated = computeJobNextRunAtMs(job, now);
if (isFiniteTimestamp(updated) && updated !== nextRun) {
job.state.nextRunAtMs = updated;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve one-shot retry backoff during startup correction

This startup pass recomputes nextRunAtMs for any job where nextRunAtMs - lastRunAtMs > 24h, but for schedule.kind === "at" the recompute path returns the original at timestamp (see computeJobNextRunAtMs), not the transient-error retry timestamp set by applyJobResult. If a deployment uses a long retry backoff (e.g. cron.retry.backoffMs > 24h), restarting between failure and retry will overwrite the delayed retry with the original (often past) at time, making the one-shot job run immediately and bypassing the configured backoff.

Useful? React with 👍 / 👎.

// might have been set to a time that's already passed, but the job hasn't
// run yet. Recomputing ensures the job runs at the next valid slot.
if (nextRun - lastRun > 24 * 60 * 60 * 1000) {
const updated = computeJobNextRunAtMs(job, now);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wrap startup recompute in the same schedule-error guard

computeJobNextRunAtMs can throw on malformed persisted cron schedules/timezones, and the normal recompute flow already protects this with try/catch (recomputeJobNextRunAtMs). This new startup branch calls it directly without error handling, so one bad stored job matching this condition can abort start() before timers are armed, preventing all cron jobs from running after restart.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Messages Only Show Up on Whatsapp when type on Terminal

1 participant