feat(langgraph): `DeltaChannel`: store sentinel in blobs, reconstruct from checkpoint_writes by sydney-runkle · Pull Request #7586 · langchain-ai/langgraph

Sydney Runkle (sydney-runkle) · 2026-04-22T17:09:56Z

DeltaChannel: stop checkpointing the same data over and over

The problem

LangGraph serializes full accumulated state into a checkpoint blob at every step. For a 100-turn conversation, that's 100 increasingly-large copies of the same message list — even though each step only added one message. At 500 turns, add_messages consumes 219 MB just for blobs.

What this PR does

Introduces DeltaChannel: a reducer channel that stores only a zero-byte sentinel in checkpoint blobs instead of the full accumulated value. On restore, the runtime walks the ancestor chain, collects the per-step writes, and replays them through the reducer to reconstruct state.

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.channels.delta import DeltaChannel
from langgraph.graph.message import _messages_delta_reducer

class State(TypedDict):
    # blob per step: ~60 bytes (sentinel) instead of a growing full list
    messages: Annotated[list, DeltaChannel(_messages_delta_reducer)]

    # bound read depth to at most 50 steps via periodic snapshots
    messages_bounded: Annotated[list, DeltaChannel(_messages_delta_reducer, snapshot_frequency=50)]

Storage savings

snapshot_frequency=N writes a full-value blob every N steps, bounding replay depth. Without it, blob storage is theoretically near-zero but read latency grows unboundedly with thread length — not practical for production.

Total checkpoint storage (blobs + writes + metadata, ~400 chars/message):

turns	`add_messages`	`DeltaChannel` (no snapshots)	`DeltaChannel(snapshot_frequency=50)`
10	129.7 KB	38.7 KB (3x) †	38.7 KB (3x)
100	9.18 MB	394 KB (23x) †	703 KB (13x)
250	55.79 MB	987 KB (57x) †	3.07 MB (18x)
500	221 MB	1.98 MB (112x) †	10.53 MB (21x)

† No snapshots trades read latency for maximum storage savings — see below.

The honest tradeoff: read latency

Without snapshots, every get_state walks the full ancestor chain — latency grows linearly with thread length.

Read latency (get_state, InMemory, 5-call avg):

turns	`add_messages`	`DeltaChannel` (no snapshots)	`DeltaChannel(snapshot_frequency=50)`
10	0.7 ms	1.1 ms †	1.1 ms
100	5.5 ms	11.1 ms †	6.0 ms
250	12.9 ms	27.2 ms †	13.6 ms

† Grows unboundedly with thread length.

Reducer requirements

DeltaChannel takes a batch reducer (state, list[writes]) -> state. Two requirements:

Deterministic: same inputs always produce the same output.
Associative (batching-invariant): applying writes in two batches must equal applying them all at once — reducer(reducer(s, [a, b]), [c]) == reducer(s, [a, b, c]). This matters because LangGraph may replay writes in different batch sizes than they were originally produced; a non-associative reducer would silently reconstruct wrong state.

If your reducer isn't associative, use BinaryOperatorAggregate instead — DeltaChannel is not a drop-in replacement for every reducer.

# ❌ Wrong — add_messages is a binary operator (value, value) -> value, not a batch reducer
messages: Annotated[list, DeltaChannel(add_messages)]

# ✅ Correct — _messages_delta_reducer is a batch reducer with dedup + RemoveMessage support
messages: Annotated[list, DeltaChannel(_messages_delta_reducer)]

# ✅ Custom batch reducer
def my_dict_reducer(state: dict, writes: list[dict]) -> dict:
    result = dict(state)
    for w in writes:
        result.update(w)
    return result

files: Annotated[dict, DeltaChannel(my_dict_reducer)]

Backwards compatibility: no migration required

Threads written under BinaryOperatorAggregate (including add_messages) work transparently after swapping the annotation — existing checkpoints are not touched.

# Before
messages: Annotated[list, add_messages]

# After — existing threads continue from where they left off
messages: Annotated[list, DeltaChannel(_messages_delta_reducer)]

How: when loading state, DeltaChannel asks the saver to walk the ancestor chain and collect per-step writes. The walk stops as soon as it hits an ancestor whose stored blob is a real value (not a sentinel). Old BinaryOperatorAggregate checkpoints store the full accumulated list at every step, so the walk terminates at the nearest pre-migration ancestor and uses that list as the starting point. Any writes recorded after the migration are then replayed on top.

This means time-travel, get_state_history, and resuming from mid-thread checkpoints all work correctly across the migration boundary — both before and after the swap. The only constraint is that the new reducer must produce the same accumulated value as the old one would have for any writes made after the migration point.

How it works

checkpoint() always returns DELTA_SENTINEL (a zero-byte msgpack ext marker). On restore, _get_channel_writes_history walks the parent chain collecting checkpoint_writes until it hits a real blob (a _DeltaSnapshot, migration artifact, or update_state-written value), then replay_writes folds those deltas onto the seed through the reducer.

In durability="async" mode, AsyncPregelLoop tracks in-flight aput_writes futures for DeltaChannel fields and drains them before aput() — ensuring writes are durable before the sentinel blob is committed.