Skip to content

fix: replay bug, direct to subgraphs#7115

Merged
Sydney Runkle (sydney-runkle) merged 12 commits into
mainfrom
sr/fix-resume-bug
Mar 11, 2026
Merged

fix: replay bug, direct to subgraphs#7115
Sydney Runkle (sydney-runkle) merged 12 commits into
mainfrom
sr/fix-resume-bug

Conversation

@sydney-runkle

@sydney-runkle Sydney Runkle (sydney-runkle) commented Mar 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes two bugs when time-traveling to a subgraph checkpoint (graph.invoke(None, sub_config)):

  1. Wrong checkpoint loaded: The old __enter__ checked replay_state before explicit checkpoint_id for nested graphs. When time-traveling, checkpoint_map resolves a specific checkpoint_id for the target subgraph, but replay_state.get_checkpoint() would find an earlier checkpoint (via list(before=parent_ckpt_id)), causing the subgraph to re-execute from the beginning (step_a, ask_1, etc. all re-ran). Fix: check for explicit checkpoint_id first.

  2. Stale RESUME writes kept: The parent sets RESUMING=True on subgraph configs — it can't distinguish time-travel from normal resume. The subgraph sees RESUMING=True and preserves old RESUME writes, so interrupt() returns cached answers instead of re-firing. Fix: check whether the subgraph's own namespace appears in checkpoint_map (it only does during time-travel — normally the map only has ancestor entries). When present, force-strip RESUME writes.

Test plan

Tests cover time travel (replay + fork) at different interrupt points for both single-nested and double-nested subgraphs, including cases where the middle subgraph itself has interrupts. All tests have sync and async variants.

Single nested (2 levels: parent → executor subgraph)

  • test_subgraph_time_travel_to_first_interrupt — time travel to 1st interrupt checkpoint; verifies step_a doesn't re-run, ask_1 re-fires (replay + fork)
  • test_subgraph_time_travel_to_second_interrupt — time travel to 2nd interrupt checkpoint; verifies step_a and ask_1 don't re-run, ask_2 re-fires (replay + fork)
  • test_subgraph_time_travel_after_completion — replay from final parent checkpoint after full completion; verifies no nodes re-run and all state values preserved

Double nested (3 levels: parent → outer → inner subgraph)

  • test_3_levels_deep_time_travel_to_first_interrupt — time travel to innermost checkpoint at 1st interrupt (replay + fork)
  • test_3_levels_deep_time_travel_to_second_interrupt — time travel to innermost checkpoint at 2nd interrupt (replay + fork)
  • test_3_levels_deep_time_travel_to_middle_subgraph — time travel to middle-level subgraph checkpoint (replay + fork)
  • test_3_levels_deep_middle_has_interrupts — middle subgraph has its own interrupt() calls (in pre node) plus an inner subgraph with interrupts; time travel to middle checkpoint at each interrupt point (replay + fork at both the middle's own interrupt and the inner's interrupt)

@sydney-runkle Sydney Runkle (sydney-runkle) merged commit f78892d into main Mar 11, 2026
66 checks passed
@sydney-runkle Sydney Runkle (sydney-runkle) deleted the sr/fix-resume-bug branch March 11, 2026 21:59
xingshuozhu1998 pushed a commit to xingshuozhu1998/langgraph that referenced this pull request May 1, 2026
## Summary

Fixes two bugs when time-traveling to a subgraph checkpoint
(`graph.invoke(None, sub_config)`):

1. **Wrong checkpoint loaded:** The old `__enter__` checked
`replay_state` before explicit `checkpoint_id` for nested graphs. When
time-traveling, `checkpoint_map` resolves a specific checkpoint_id for
the target subgraph, but `replay_state.get_checkpoint()` would find an
*earlier* checkpoint (via `list(before=parent_ckpt_id)`), causing the
subgraph to re-execute from the beginning (step_a, ask_1, etc. all
re-ran). Fix: check for explicit `checkpoint_id` first.

2. **Stale RESUME writes kept:** The parent sets `RESUMING=True` on
subgraph configs — it can't distinguish time-travel from normal resume.
The subgraph sees `RESUMING=True` and preserves old RESUME writes, so
`interrupt()` returns cached answers instead of re-firing. Fix: check
whether the subgraph's own namespace appears in `checkpoint_map` (it
only does during time-travel — normally the map only has ancestor
entries). When present, force-strip RESUME writes.

## Test plan

Tests cover time travel (replay + fork) at different interrupt points
for both single-nested and double-nested subgraphs, including cases
where the middle subgraph itself has interrupts. All tests have sync and
async variants.

### Single nested (2 levels: parent → executor subgraph)
- `test_subgraph_time_travel_to_first_interrupt` — time travel to 1st
interrupt checkpoint; verifies step_a doesn't re-run, ask_1 re-fires
(replay + fork)
- `test_subgraph_time_travel_to_second_interrupt` — time travel to 2nd
interrupt checkpoint; verifies step_a and ask_1 don't re-run, ask_2
re-fires (replay + fork)
- `test_subgraph_time_travel_after_completion` — replay from final
parent checkpoint after full completion; verifies no nodes re-run and
all state values preserved

### Double nested (3 levels: parent → outer → inner subgraph)
- `test_3_levels_deep_time_travel_to_first_interrupt` — time travel to
innermost checkpoint at 1st interrupt (replay + fork)
- `test_3_levels_deep_time_travel_to_second_interrupt` — time travel to
innermost checkpoint at 2nd interrupt (replay + fork)
- `test_3_levels_deep_time_travel_to_middle_subgraph` — time travel to
middle-level subgraph checkpoint (replay + fork)
- `test_3_levels_deep_middle_has_interrupts` — middle subgraph has its
own `interrupt()` calls (in `pre` node) plus an inner subgraph with
interrupts; time travel to middle checkpoint at each interrupt point
(replay + fork at both the middle's own interrupt and the inner's
interrupt)
Christian Bromann (christian-bromann) added a commit to langchain-ai/langgraphjs that referenced this pull request Jun 10, 2026
## Summary

Ports Python time-travel fixes
([#7038](langchain-ai/langgraph#7038),
[#7115](langchain-ai/langgraph#7115),
[#7498](langchain-ai/langgraph#7498),
[#7499](langchain-ai/langgraph#7499)) into
`@langchain/langgraph` so replay/fork behave correctly with interrupts
and nested subgraphs.

- **Stale `RESUME` on replay** — Replaying from a checkpoint before an
interrupt no longer consumes cached resume writes; interrupts re-fire
with the correct payload.
- **Subgraph checkpoint loading on time travel** — Introduces
`ReplayState` (`CONFIG_KEY_REPLAY_STATE`) so nested subgraphs load the
checkpoint that existed at the replay point on first visit, then resume
normal head loading within the same run.
- **Parent fork checkpoints on replay** — Time travel runs through
`PregelLoop._first()` (not `stream()` delegation on the parent
`Pregel`), creating an eager `source: "fork"` checkpoint and propagating
`ReplayState` to subgraphs.
- **Direct-to-subgraph time travel** — `getState()` subgraph delegation
is guarded with `CONFIG_KEY_READ`; direct subgraph configs strip stale
`RESUME` writes and prefer explicit `checkpoint_id` over
`checkpoint_map` when both are set.
- **Streaming** — Fixes subgraph interrupt namespace when streaming with
`subgraphs: true` (empty `checkpoint_ns` no longer becomes `[""]`;
parent emits interrupts under the deepest `checkpoint_map` namespace).

Closes #2325 (supersedes the earlier partial port).

### Implementation notes

| Area | Change |
|------|--------|
| `pregel/replay.ts` | New `ReplayState` class (mirrors Python) |
| `pregel/loop.ts` | Replay/time-travel detection, fork creation,
`RESUME` stripping, `ReplayState` wiring, stream namespace helpers |
| `pregel/index.ts` | `getState` subgraph delegation guard only (removed
`stream()` bypass that skipped parent fork creation) |
| Tests | `time_travel.test.ts` (14), `time_travel_extended.test.ts`
(33), shared `time_travel_helpers.ts`, Vitest matchers `toBeInterrupted`
/ `toHaveInterruptValue` |

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants