fix: replay behavior for parent + subgraphs!#7038
Merged
Merged
Conversation
William FH (hinthornw)
approved these changes
Mar 7, 2026
…raph into sr/diabolical
Closed
5 tasks
xingshuozhu1998
pushed a commit
to xingshuozhu1998/langgraph
that referenced
this pull request
May 1, 2026
## Summary Fix time travel (replay and fork) for graphs with interrupts and subgraphs. ## Problem Two issues with replaying/forking from earlier checkpoints: 1. **Stale interrupt values during replay** — Replays incorrectly reused cached `RESUME` values from prior `interrupt()` calls, so interrupts silently returned stale answers instead of re-firing. 2. **Wrong subgraph state during time travel** — Subgraphs always loaded their **latest** checkpoint instead of the one corresponding to the parent's historical state. This caused subgraphs to skip execution or produce incorrect results during replay/fork. ## Changes Code changes span `libs/langgraph/langgraph/pregel/_loop.py`, `libs/langgraph/langgraph/_internal/_constants.py`, and a new `libs/langgraph/langgraph/_internal/_replay.py` module: - **Strip stale `RESUME` writes on replay** — During replays, cached `RESUME` writes are filtered out so `interrupt()` re-fires instead of returning old values. Genuine resumes (`Command(resume=...)`) preserve these writes. - **Rename `skip_done_tasks` → `is_replaying`** — Clearer naming for the flag that tracks whether the current run is replaying from a specific checkpoint. - **New `ReplayState` class (`_replay.py`)** — Encapsulates subgraph checkpoint loading during time-travel. Tracks a parent checkpoint ID upper bound and which subgraph namespaces have already loaded their pre-replay checkpoint. On the first visit to a subgraph namespace, it loads the latest checkpoint created *before* the replay point (via `checkpointer.list(..., before=...)` with `limit=1`). On subsequent visits (e.g. the same subgraph in a later loop iteration), it falls back to normal latest-checkpoint loading. The task-id suffix is stripped from namespaces so the same logical subgraph is recognized across loop iterations. - **New `CONFIG_KEY_REPLAY_STATE` config key** — The parent graph creates a `ReplayState` instance and passes it to subgraphs via config. For forks (`source=update`), the replay state uses the fork's parent checkpoint ID since the fork was created after the subgraph's original checkpoints. The single `ReplayState` instance is shared by reference across all derived configs within one parent execution. - **Subgraph checkpoint loading in `__enter__`/`__aenter__`** — When a subgraph detects a `ReplayState` in its config, it delegates checkpoint loading to `ReplayState.get_checkpoint`/`aget_checkpoint` instead of using the default `get_tuple`. It also clears `CONFIG_KEY_RESUMING` so `_first` re-applies input and recreates ephemeral routing channels. ## Tests New test files `test_time_travel.py` (~2500 lines) and `test_time_travel_async.py` (~2200 lines) covering: - Replay and fork with interrupts (single and multiple) - Replay and fork for graphs with and without subgraphs - Correct subgraph checkpoint restoration during parent time travel - `get_state` with subgraph state during replay
Christian Bromann (christian-bromann)
added a commit
to langchain-ai/langgraphjs
that referenced
this pull request
Jun 10, 2026
## Summary Ports Python time-travel fixes ([#7038](langchain-ai/langgraph#7038), [#7115](langchain-ai/langgraph#7115), [#7498](langchain-ai/langgraph#7498), [#7499](langchain-ai/langgraph#7499)) into `@langchain/langgraph` so replay/fork behave correctly with interrupts and nested subgraphs. - **Stale `RESUME` on replay** — Replaying from a checkpoint before an interrupt no longer consumes cached resume writes; interrupts re-fire with the correct payload. - **Subgraph checkpoint loading on time travel** — Introduces `ReplayState` (`CONFIG_KEY_REPLAY_STATE`) so nested subgraphs load the checkpoint that existed at the replay point on first visit, then resume normal head loading within the same run. - **Parent fork checkpoints on replay** — Time travel runs through `PregelLoop._first()` (not `stream()` delegation on the parent `Pregel`), creating an eager `source: "fork"` checkpoint and propagating `ReplayState` to subgraphs. - **Direct-to-subgraph time travel** — `getState()` subgraph delegation is guarded with `CONFIG_KEY_READ`; direct subgraph configs strip stale `RESUME` writes and prefer explicit `checkpoint_id` over `checkpoint_map` when both are set. - **Streaming** — Fixes subgraph interrupt namespace when streaming with `subgraphs: true` (empty `checkpoint_ns` no longer becomes `[""]`; parent emits interrupts under the deepest `checkpoint_map` namespace). Closes #2325 (supersedes the earlier partial port). ### Implementation notes | Area | Change | |------|--------| | `pregel/replay.ts` | New `ReplayState` class (mirrors Python) | | `pregel/loop.ts` | Replay/time-travel detection, fork creation, `RESUME` stripping, `ReplayState` wiring, stream namespace helpers | | `pregel/index.ts` | `getState` subgraph delegation guard only (removed `stream()` bypass that skipped parent fork creation) | | Tests | `time_travel.test.ts` (14), `time_travel_extended.test.ts` (33), shared `time_travel_helpers.ts`, Vitest matchers `toBeInterrupted` / `toHaveInterruptValue` | --------- Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix time travel (replay and fork) for graphs with interrupts and subgraphs.
Problem
Two issues with replaying/forking from earlier checkpoints:
Stale interrupt values during replay — Replays incorrectly reused cached
RESUMEvalues from priorinterrupt()calls, so interrupts silently returned stale answers instead of re-firing.Wrong subgraph state during time travel — Subgraphs always loaded their latest checkpoint instead of the one corresponding to the parent's historical state. This caused subgraphs to skip execution or produce incorrect results during replay/fork.
Changes
Code changes span
libs/langgraph/langgraph/pregel/_loop.py,libs/langgraph/langgraph/_internal/_constants.py, and a newlibs/langgraph/langgraph/_internal/_replay.pymodule:Strip stale
RESUMEwrites on replay — During replays, cachedRESUMEwrites are filtered out sointerrupt()re-fires instead of returning old values. Genuine resumes (Command(resume=...)) preserve these writes.Rename
skip_done_tasks→is_replaying— Clearer naming for the flag that tracks whether the current run is replaying from a specific checkpoint.New
ReplayStateclass (_replay.py) — Encapsulates subgraph checkpoint loading during time-travel. Tracks a parent checkpoint ID upper bound and which subgraph namespaces have already loaded their pre-replay checkpoint. On the first visit to a subgraph namespace, it loads the latest checkpoint created before the replay point (viacheckpointer.list(..., before=...)withlimit=1). On subsequent visits (e.g. the same subgraph in a later loop iteration), it falls back to normal latest-checkpoint loading. The task-id suffix is stripped from namespaces so the same logical subgraph is recognized across loop iterations.New
CONFIG_KEY_REPLAY_STATEconfig key — The parent graph creates aReplayStateinstance and passes it to subgraphs via config. For forks (source=update), the replay state uses the fork's parent checkpoint ID since the fork was created after the subgraph's original checkpoints. The singleReplayStateinstance is shared by reference across all derived configs within one parent execution.Subgraph checkpoint loading in
__enter__/__aenter__— When a subgraph detects aReplayStatein its config, it delegates checkpoint loading toReplayState.get_checkpoint/aget_checkpointinstead of using the defaultget_tuple. It also clearsCONFIG_KEY_RESUMINGso_firstre-applies input and recreates ephemeral routing channels.Tests
New test files
test_time_travel.py(~2500 lines) andtest_time_travel_async.py(~2200 lines) covering:get_statewith subgraph state during replay