Skip to content

fix: time travel replay/fork for graphs with interrupts and subgraphs#2325

Closed
open-swe[bot] wants to merge 1 commit into
mainfrom
open-swe/776ee51f-5e2f-752b-acb1-4584467156d4
Closed

fix: time travel replay/fork for graphs with interrupts and subgraphs#2325
open-swe[bot] wants to merge 1 commit into
mainfrom
open-swe/776ee51f-5e2f-752b-acb1-4584467156d4

Conversation

@open-swe

@open-swe open-swe Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Description

Ports three Python PRs (#7038, #7115, #7498) that fix time travel (replay and fork) for graphs with interrupts and subgraphs.

Problems fixed

  1. Stale interrupt values during replay — Replays incorrectly reused cached RESUME values from prior interrupt() calls, so interrupts silently returned stale answers instead of re-firing.
  2. Wrong subgraph state during time travel — Subgraphs always loaded their latest checkpoint instead of the one corresponding to the parent's historical state. This caused subgraphs to skip execution or produce incorrect results during replay/fork.
  3. Wrong checkpoint loaded on direct-to-subgraph time travel — When time-traveling directly to a subgraph checkpoint, the subgraph's own namespace appearing in checkpoint_map wasn't detected, causing stale RESUME writes to be preserved instead of stripped.
  4. No fork checkpoint on time travel — When replaying from a specific checkpoint, if the execution hit an interrupt before _putCheckpoint() ran, no new checkpoint was created. The parent's "latest" checkpoint remained the old one, so subsequent Command(resume=...) calls loaded the wrong state.

Changes

constants.ts: Added CONFIG_KEY_REPLAY_STATE constant and added it to the RESERVED set.

pregel/replay.ts (new): ReplayState class that tracks which subgraph namespaces have already loaded their pre-replay checkpoint. On first visit to a subgraph namespace, it loads the latest checkpoint created before the replay point (via checkpointer.list(..., before=..., limit=1)). On subsequent visits (e.g. the same subgraph in a later loop iteration), it falls back to normal latest-checkpoint loading. The task-id suffix is stripped from namespaces so the same logical subgraph is recognized across loop iterations.

pregel/loop.ts:

  • Added isReplaying getter (equivalent to Python's is_replaying) — returns true when skipDoneTasks is false (i.e., replaying from a specific checkpoint).
  • Strip stale RESUME writes on replay — During replays, cached RESUME writes are filtered out so interrupt() re-fires instead of returning old values. Genuine resumes (Command(resume=...)) preserve these writes.
  • Time-travel detection for subgraphs (PR #7115) — When a subgraph's own namespace appears in checkpoint_map, it's detected as time-travel and RESUME writes are force-stripped even when CONFIG_KEY_RESUMING is set by the parent.
  • Eager fork checkpoint on time travel (PR #7498) — When the loop detects a time-travel replay (not an update_state fork), it eagerly writes a fork checkpoint (source="fork") at the start of the tick. This ensures the parent thread's latest checkpoint points to the replayed state and subsequent resumes find the correct checkpoint. Stale INTERRUPT pending writes are also cleared.
  • ReplayState propagation — The parent graph creates a ReplayState instance and passes it to subgraphs via config. For forks (source=update/fork), the replay state uses the fork's parent checkpoint ID.
  • Subgraph checkpoint loading — When a subgraph detects a ReplayState in its config (and no explicit checkpoint_id), it delegates checkpoint loading to ReplayState.getCheckpoint() instead of using the default getTuple. It also clears CONFIG_KEY_RESUMING so the subgraph re-applies input naturally.

tests/python_port/checkpoint.test.ts: Updated existing test_running_from_checkpoint_id_retains_previous_writes to account for the new fork checkpoint appearing in history (history length increases by 2 instead of 1).

Tests

New test file time_travel.test.ts with 12 tests covering:

PR #7038 tests:

  • Replay from checkpoint before interrupt strips stale RESUME writes
  • Replay with subgraph strips stale RESUME writes

PR #7115 tests:

  • Time travel to subgraph checkpoint at first interrupt (replay + fork)
  • Time travel to subgraph checkpoint at second interrupt (replay + fork)
  • Time travel to subgraph checkpoint after completion
  • Time travel to middle subgraph in 3-level graph (replay + fork)
  • Time travel when middle subgraph has its own interrupts (replay + fork at both levels)

PR #7498 tests:

  • Replay from before interrupt then resume — verifies fork checkpoint created and full checkpoint history
  • Subgraph time travel resume from first interrupt — verifies fork + full resume flow
  • Subgraph time travel resume from second interrupt — verifies first answer preserved
  • Subgraph time travel checkpoint pattern — verifies fork branches from correct replay point
  • Replay from parent checkpoint with subgraph interrupt then resume — verifies full flow with router + subgraph + post-processing

Test Plan

  • All 12 new time_travel tests pass
  • All 180 existing pregel tests pass (no regressions)
  • All 13 checkpoint tests pass (1 updated for new fork checkpoint)
  • All 10 interrupt tests pass
  • Build succeeds, lint and format pass

Opened collaboratively by Sydney Runkle and open-swe.

Ports Python PRs #7038, #7115, #7498 to fix:
- Stale RESUME writes during replay (interrupts returned cached answers)
- Wrong subgraph checkpoint loading during time travel
- Missing fork checkpoint on time travel (resumes loaded wrong state)

Adds ReplayState class, isReplaying property, eager fork checkpoint
creation, and direct-to-subgraph time travel detection.

Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com>
@changeset-bot

changeset-bot Bot commented Apr 16, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: a0a713a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Apr 16, 2026

Copy link
Copy Markdown

Open in StackBlitz

@langchain/langgraph-checkpoint

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-checkpoint@2325

@langchain/langgraph-checkpoint-mongodb

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-checkpoint-mongodb@2325

@langchain/langgraph-checkpoint-postgres

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-checkpoint-postgres@2325

@langchain/langgraph-checkpoint-redis

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-checkpoint-redis@2325

@langchain/langgraph-checkpoint-sqlite

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-checkpoint-sqlite@2325

@langchain/langgraph-checkpoint-validation

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-checkpoint-validation@2325

create-langgraph

npm i https://pkg.pr.new/langchain-ai/langgraphjs/create-langgraph@2325

@langchain/langgraph-api

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-api@2325

@langchain/langgraph-cli

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-cli@2325

@langchain/langgraph

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph@2325

@langchain/langgraph-cua

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-cua@2325

@langchain/langgraph-supervisor

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-supervisor@2325

@langchain/langgraph-swarm

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-swarm@2325

@langchain/langgraph-ui

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-ui@2325

@langchain/langgraph-sdk

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/langgraph-sdk@2325

@langchain/angular

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/angular@2325

@langchain/react

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/react@2325

@langchain/svelte

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/svelte@2325

@langchain/vue

npm i https://pkg.pr.new/langchain-ai/langgraphjs/@langchain/vue@2325

commit: a0a713a

@christian-bromann

Copy link
Copy Markdown
Member

Closing in favor of #2179

Christian Bromann (christian-bromann) added a commit that referenced this pull request Jun 10, 2026
## Summary

Ports Python time-travel fixes
([#7038](langchain-ai/langgraph#7038),
[#7115](langchain-ai/langgraph#7115),
[#7498](langchain-ai/langgraph#7498),
[#7499](langchain-ai/langgraph#7499)) into
`@langchain/langgraph` so replay/fork behave correctly with interrupts
and nested subgraphs.

- **Stale `RESUME` on replay** — Replaying from a checkpoint before an
interrupt no longer consumes cached resume writes; interrupts re-fire
with the correct payload.
- **Subgraph checkpoint loading on time travel** — Introduces
`ReplayState` (`CONFIG_KEY_REPLAY_STATE`) so nested subgraphs load the
checkpoint that existed at the replay point on first visit, then resume
normal head loading within the same run.
- **Parent fork checkpoints on replay** — Time travel runs through
`PregelLoop._first()` (not `stream()` delegation on the parent
`Pregel`), creating an eager `source: "fork"` checkpoint and propagating
`ReplayState` to subgraphs.
- **Direct-to-subgraph time travel** — `getState()` subgraph delegation
is guarded with `CONFIG_KEY_READ`; direct subgraph configs strip stale
`RESUME` writes and prefer explicit `checkpoint_id` over
`checkpoint_map` when both are set.
- **Streaming** — Fixes subgraph interrupt namespace when streaming with
`subgraphs: true` (empty `checkpoint_ns` no longer becomes `[""]`;
parent emits interrupts under the deepest `checkpoint_map` namespace).

Closes #2325 (supersedes the earlier partial port).

### Implementation notes

| Area | Change |
|------|--------|
| `pregel/replay.ts` | New `ReplayState` class (mirrors Python) |
| `pregel/loop.ts` | Replay/time-travel detection, fork creation,
`RESUME` stripping, `ReplayState` wiring, stream namespace helpers |
| `pregel/index.ts` | `getState` subgraph delegation guard only (removed
`stream()` bypass that skipped parent fork creation) |
| Tests | `time_travel.test.ts` (14), `time_travel_extended.test.ts`
(33), shared `time_travel_helpers.ts`, Vitest matchers `toBeInterrupted`
/ `toHaveInterruptValue` |

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants