fix: replay bug, direct to subgraphs#7115
Merged
Merged
Conversation
…anggraph into sr/fix-resume-bug
William FH (hinthornw)
approved these changes
Mar 11, 2026
Closed
5 tasks
xingshuozhu1998
pushed a commit
to xingshuozhu1998/langgraph
that referenced
this pull request
May 1, 2026
## Summary Fixes two bugs when time-traveling to a subgraph checkpoint (`graph.invoke(None, sub_config)`): 1. **Wrong checkpoint loaded:** The old `__enter__` checked `replay_state` before explicit `checkpoint_id` for nested graphs. When time-traveling, `checkpoint_map` resolves a specific checkpoint_id for the target subgraph, but `replay_state.get_checkpoint()` would find an *earlier* checkpoint (via `list(before=parent_ckpt_id)`), causing the subgraph to re-execute from the beginning (step_a, ask_1, etc. all re-ran). Fix: check for explicit `checkpoint_id` first. 2. **Stale RESUME writes kept:** The parent sets `RESUMING=True` on subgraph configs — it can't distinguish time-travel from normal resume. The subgraph sees `RESUMING=True` and preserves old RESUME writes, so `interrupt()` returns cached answers instead of re-firing. Fix: check whether the subgraph's own namespace appears in `checkpoint_map` (it only does during time-travel — normally the map only has ancestor entries). When present, force-strip RESUME writes. ## Test plan Tests cover time travel (replay + fork) at different interrupt points for both single-nested and double-nested subgraphs, including cases where the middle subgraph itself has interrupts. All tests have sync and async variants. ### Single nested (2 levels: parent → executor subgraph) - `test_subgraph_time_travel_to_first_interrupt` — time travel to 1st interrupt checkpoint; verifies step_a doesn't re-run, ask_1 re-fires (replay + fork) - `test_subgraph_time_travel_to_second_interrupt` — time travel to 2nd interrupt checkpoint; verifies step_a and ask_1 don't re-run, ask_2 re-fires (replay + fork) - `test_subgraph_time_travel_after_completion` — replay from final parent checkpoint after full completion; verifies no nodes re-run and all state values preserved ### Double nested (3 levels: parent → outer → inner subgraph) - `test_3_levels_deep_time_travel_to_first_interrupt` — time travel to innermost checkpoint at 1st interrupt (replay + fork) - `test_3_levels_deep_time_travel_to_second_interrupt` — time travel to innermost checkpoint at 2nd interrupt (replay + fork) - `test_3_levels_deep_time_travel_to_middle_subgraph` — time travel to middle-level subgraph checkpoint (replay + fork) - `test_3_levels_deep_middle_has_interrupts` — middle subgraph has its own `interrupt()` calls (in `pre` node) plus an inner subgraph with interrupts; time travel to middle checkpoint at each interrupt point (replay + fork at both the middle's own interrupt and the inner's interrupt)
Christian Bromann (christian-bromann)
added a commit
to langchain-ai/langgraphjs
that referenced
this pull request
Jun 10, 2026
## Summary Ports Python time-travel fixes ([#7038](langchain-ai/langgraph#7038), [#7115](langchain-ai/langgraph#7115), [#7498](langchain-ai/langgraph#7498), [#7499](langchain-ai/langgraph#7499)) into `@langchain/langgraph` so replay/fork behave correctly with interrupts and nested subgraphs. - **Stale `RESUME` on replay** — Replaying from a checkpoint before an interrupt no longer consumes cached resume writes; interrupts re-fire with the correct payload. - **Subgraph checkpoint loading on time travel** — Introduces `ReplayState` (`CONFIG_KEY_REPLAY_STATE`) so nested subgraphs load the checkpoint that existed at the replay point on first visit, then resume normal head loading within the same run. - **Parent fork checkpoints on replay** — Time travel runs through `PregelLoop._first()` (not `stream()` delegation on the parent `Pregel`), creating an eager `source: "fork"` checkpoint and propagating `ReplayState` to subgraphs. - **Direct-to-subgraph time travel** — `getState()` subgraph delegation is guarded with `CONFIG_KEY_READ`; direct subgraph configs strip stale `RESUME` writes and prefer explicit `checkpoint_id` over `checkpoint_map` when both are set. - **Streaming** — Fixes subgraph interrupt namespace when streaming with `subgraphs: true` (empty `checkpoint_ns` no longer becomes `[""]`; parent emits interrupts under the deepest `checkpoint_map` namespace). Closes #2325 (supersedes the earlier partial port). ### Implementation notes | Area | Change | |------|--------| | `pregel/replay.ts` | New `ReplayState` class (mirrors Python) | | `pregel/loop.ts` | Replay/time-travel detection, fork creation, `RESUME` stripping, `ReplayState` wiring, stream namespace helpers | | `pregel/index.ts` | `getState` subgraph delegation guard only (removed `stream()` bypass that skipped parent fork creation) | | Tests | `time_travel.test.ts` (14), `time_travel_extended.test.ts` (33), shared `time_travel_helpers.ts`, Vitest matchers `toBeInterrupted` / `toHaveInterruptValue` | --------- Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two bugs when time-traveling to a subgraph checkpoint (
graph.invoke(None, sub_config)):Wrong checkpoint loaded: The old
__enter__checkedreplay_statebefore explicitcheckpoint_idfor nested graphs. When time-traveling,checkpoint_mapresolves a specific checkpoint_id for the target subgraph, butreplay_state.get_checkpoint()would find an earlier checkpoint (vialist(before=parent_ckpt_id)), causing the subgraph to re-execute from the beginning (step_a, ask_1, etc. all re-ran). Fix: check for explicitcheckpoint_idfirst.Stale RESUME writes kept: The parent sets
RESUMING=Trueon subgraph configs — it can't distinguish time-travel from normal resume. The subgraph seesRESUMING=Trueand preserves old RESUME writes, sointerrupt()returns cached answers instead of re-firing. Fix: check whether the subgraph's own namespace appears incheckpoint_map(it only does during time-travel — normally the map only has ancestor entries). When present, force-strip RESUME writes.Test plan
Tests cover time travel (replay + fork) at different interrupt points for both single-nested and double-nested subgraphs, including cases where the middle subgraph itself has interrupts. All tests have sync and async variants.
Single nested (2 levels: parent → executor subgraph)
test_subgraph_time_travel_to_first_interrupt— time travel to 1st interrupt checkpoint; verifies step_a doesn't re-run, ask_1 re-fires (replay + fork)test_subgraph_time_travel_to_second_interrupt— time travel to 2nd interrupt checkpoint; verifies step_a and ask_1 don't re-run, ask_2 re-fires (replay + fork)test_subgraph_time_travel_after_completion— replay from final parent checkpoint after full completion; verifies no nodes re-run and all state values preservedDouble nested (3 levels: parent → outer → inner subgraph)
test_3_levels_deep_time_travel_to_first_interrupt— time travel to innermost checkpoint at 1st interrupt (replay + fork)test_3_levels_deep_time_travel_to_second_interrupt— time travel to innermost checkpoint at 2nd interrupt (replay + fork)test_3_levels_deep_time_travel_to_middle_subgraph— time travel to middle-level subgraph checkpoint (replay + fork)test_3_levels_deep_middle_has_interrupts— middle subgraph has its owninterrupt()calls (inprenode) plus an inner subgraph with interrupts; time travel to middle checkpoint at each interrupt point (replay + fork at both the middle's own interrupt and the inner's interrupt)