fix: time travel when going back to interrupt node#7498
Merged
Conversation
William FH (hinthornw)
approved these changes
Apr 13, 2026
Closed
5 tasks
4 tasks
Sydney Runkle (sydney-runkle)
added a commit
that referenced
this pull request
Apr 27, 2026
…ist (#7582) ## Summary Fixes #7498 — `MESSAGE_COERCION_FAILURE` when resuming threads checkpointed before v1.0.1. **Root cause:** PR #6269 (v1.0.1) added an `_allowed_json_modules` security gate to `JsonPlusSerializer._reviver`. The gate defaults to `None`, so old `"json"`-format checkpoint blobs containing `lc=2` constructor dicts (the pre-msgpack serialization format for pydantic objects like `HumanMessage`) are now returned as raw dicts instead of being reconstructed. Those raw dicts reach `add_messages → convert_to_messages`, which sees `type="constructor"` and raises `MESSAGE_COERCION_FAILURE`. Fresh first-turn messages are unaffected because current `dumps_typed` only writes `"msgpack"` blobs. **Fix:** `_reviver` now reconstructs `lc=2` blobs whose target class is already in `SAFE_MSGPACK_TYPES` — the same curated allowlist already used by the msgpack deserialization path (includes all standard LangChain message types). Unknown classes are still blocked, preserving the security intent of #6269. ## Changes - `libs/checkpoint/langgraph/checkpoint/serde/jsonplus.py` — add `_is_safe_json_type()` helper; update `_reviver` and `_check_allowed_json_modules` to allow safe types without an explicit allowlist - `libs/checkpoint/tests/test_jsonplus.py` — two new regression tests: safe-type `lc=2` blobs revive correctly; unknown-type `lc=2` blobs stay blocked ## Test plan - [ ] `test_lc2_json_safe_type_revives_without_allowlist` — `HumanMessage`/`AIMessage` lc=2 JSON blobs round-trip to proper `BaseMessage` objects with no allowlist configured - [ ] `test_lc2_json_unknown_type_stays_blocked_without_allowlist` — `pprint.pprint` lc=2 blob still returns raw dict (not reconstructed) - [ ] `test_deserde_invalid_module` — existing behaviour unchanged - [ ] Full `test_jsonplus.py` suite: 93/93 passing Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Christian Bromann (christian-bromann)
added a commit
to langchain-ai/langgraphjs
that referenced
this pull request
Jun 10, 2026
## Summary Ports Python time-travel fixes ([#7038](langchain-ai/langgraph#7038), [#7115](langchain-ai/langgraph#7115), [#7498](langchain-ai/langgraph#7498), [#7499](langchain-ai/langgraph#7499)) into `@langchain/langgraph` so replay/fork behave correctly with interrupts and nested subgraphs. - **Stale `RESUME` on replay** — Replaying from a checkpoint before an interrupt no longer consumes cached resume writes; interrupts re-fire with the correct payload. - **Subgraph checkpoint loading on time travel** — Introduces `ReplayState` (`CONFIG_KEY_REPLAY_STATE`) so nested subgraphs load the checkpoint that existed at the replay point on first visit, then resume normal head loading within the same run. - **Parent fork checkpoints on replay** — Time travel runs through `PregelLoop._first()` (not `stream()` delegation on the parent `Pregel`), creating an eager `source: "fork"` checkpoint and propagating `ReplayState` to subgraphs. - **Direct-to-subgraph time travel** — `getState()` subgraph delegation is guarded with `CONFIG_KEY_READ`; direct subgraph configs strip stale `RESUME` writes and prefer explicit `checkpoint_id` over `checkpoint_map` when both are set. - **Streaming** — Fixes subgraph interrupt namespace when streaming with `subgraphs: true` (empty `checkpoint_ns` no longer becomes `[""]`; parent emits interrupts under the deepest `checkpoint_map` namespace). Closes #2325 (supersedes the earlier partial port). ### Implementation notes | Area | Change | |------|--------| | `pregel/replay.ts` | New `ReplayState` class (mirrors Python) | | `pregel/loop.ts` | Replay/time-travel detection, fork creation, `RESUME` stripping, `ReplayState` wiring, stream namespace helpers | | `pregel/index.ts` | `getState` subgraph delegation guard only (removed `stream()` bypass that skipped parent fork creation) | | Tests | `time_travel.test.ts` (14), `time_travel_extended.test.ts` (33), shared `time_travel_helpers.ts`, Vitest matchers `toBeInterrupted` / `toHaveInterruptValue` | --------- Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Create fork checkpoint on subgraph time travel
Problem
When time-traveling to a subgraph checkpoint that has an interrupt, and then resuming, the resume would load the wrong state — it would pick up the original execution's latest checkpoint instead of the time-traveled one.
This happened because replaying from a subgraph checkpoint never created a new parent checkpoint. If the replay hit an interrupt before
after_tick()ran, no checkpoint was written at all, so the parent's "latest" checkpoint was still the old one from the original execution.Fix
When the loop detects a time-travel replay (not an
update_statefork), it now eagerly writes a fork checkpoint at the start of the tick. This ensures:Command(resume=...)calls find the correct checkpointINTERRUPTpending writes from the old checkpoint are cleared (they reference old task IDs)Additionally, the subgraph replay logic now uses the parent checkpoint ID (from
prev_checkpoint_config) when resolving subgraph checkpoints during time-travel, matching the existing behavior forupdate_stateforks.Checkpoint flow diagrams
Before fix: time travel leaves no fork
After fix: time travel creates a fork
Manual fork via
update_state(unchanged)Changes
libs/langgraph/langgraph/pregel/_loop.py:is_time_travelingflag from the existing replay detection logic for reusesource="fork") eagerly at the start of a time-travel tick, before execution beginsINTERRUPTpending writes when creating the fork (they reference old task IDs that won't match the new checkpoint)source in ("update", "fork")instead of a separateis_time_travelingcondition, since the new fork checkpoint now hassource="fork"libs/langgraph/tests/test_time_travel.pyandtest_time_travel_async.py: Added 4 new test cases (sync + async):test_replay_from_before_interrupt_then_resume— replays from a checkpoint before an interrupt, resumes with a new answer, and verifies the full checkpoint history (source, next, values) at each stagetest_subgraph_time_travel_resume_from_first_interrupt— time-travels to a subgraph's first interrupt, resumes both interrupts with new answers, and verifies the fork creates a new branch while preserving the originaltest_subgraph_time_travel_resume_from_second_interrupt— time-travels to a subgraph's second interrupt, resumes with a new answer, and verifies the first interrupt's original answer is preservedtest_subgraph_time_travel_checkpoint_pattern— verifies the fork checkpoint branches from the correct replay point and that the full checkpoint tree is correct after resumelibs/langgraph/tests/test_pregel.py/test_pregel_async.py: Updated existingtest_weather_subgraph_stateto account for the new fork checkpoint appearing in history (history length increases by 1)