Skip to content

Implement batch() as a single computation with multiple inputs#6

Closed
Nuno Campos (nfcampos) wants to merge 5 commits intomainfrom
nc/combined-batch
Closed

Implement batch() as a single computation with multiple inputs#6
Nuno Campos (nfcampos) wants to merge 5 commits intomainfrom
nc/combined-batch

Conversation

@nfcampos
Copy link
Copy Markdown
Contributor

@nfcampos Nuno Campos (nfcampos) commented Sep 19, 2023

Failing tests, probably wrong approach

@nfcampos Nuno Campos (nfcampos) marked this pull request as draft October 5, 2023 08:55
Base automatically changed from nc/single-out-message to main October 5, 2023 09:00
@nfcampos Nuno Campos (nfcampos) deleted the nc/combined-batch branch August 26, 2024 22:30
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…angchain-ai#7

Comprehensive handoff for a fresh-context agent picking up the
combined Step 1.2b + 1.3 milestone partway through. Covers:

- Where we are (4 foundation commits landed: 720f5b0, 5325311,
  3339ea4, 39206f3).
- What each foundation commit delivered (architectural surface,
  parity gates).
- Verification block (181 cargo, clippy clean, 73/73 + 49 + reject,
  58/58 conformance, 69 parity-gate tests).
- Sub-task plan for the remaining four (#4c channel translation,
  langchain-ai#5 StateGraph compiler, langchain-ai#6 langgraph_rs.backend monkeypatch,
  langchain-ai#7 87-test parity gate green).
- Lessons learned this session: Python::with_gil → Python::attach
  in PyO3 0.28; PyAnyMethods::downcast deprecation; uv pip install
  needs VIRTUAL_ENV explicit; background maturin build can race
  edits; pytest-asyncio not in bridge venv (use anyio); adding Op
  variants requires updating all hand-coded matches; clippy 1.95
  is_multiple_of + collapsed-if-let-chain lints; maturin
  python-source switch needed for langchain-ai#6; PyErr stash side-channel
  pattern for cross-Rust exception class preservation; json round-
  trip is fast enough for value translation.
- Open follow-ups snapshot.
- Sub-task tracking table.

Companion to STEP-1.2A-HANDOFF.md and SESSION-RESUME.md — read all
three on resume.
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…k bridge)

Round-trips Python `BaseChannel.checkpoint()` state through Rust
`from_checkpoint` for every stdlib channel class. Closes the channel-
translation deliverable for the combined Step 1.2b + 1.3 milestone
(`rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`). The runner monkeypatch in
sub-step langchain-ai#6 will use `extract_state` / `apply_state` to wire Python
channel instances through the Rust loop tick.

What landed
-----------
- New `rust/ffi/langgraph-py/src/channel_translate.rs` — Rust
  translation gate. Per-class `round_trip_*` functions parse the
  msgpack-encoded state, build a Rust channel via `from_checkpoint`,
  and re-encode the result. 10 stdlib classes covered (LastValue,
  LastValueAfterFinish, Topic, BinaryOperatorAggregate,
  EphemeralValue, AnyValue, UntrackedValue, NamedBarrierValue,
  NamedBarrierValueAfterFinish, DeltaChannel). Custom user-defined
  channels return `ValueError` (Python: `RustBackendUnsupported`).
- New PyO3 entry points `translate_channel_round_trip` (msgpack
  bytes in, msgpack bytes out) and `supported_channel_classes`.
- New Python helper `parity/scripts/_channel_translate.py`:
  `class_name`, `extract_state`, `apply_state`, `pack_state`,
  `unpack_state`, `RustBackendUnsupported`. Per-class dispatch via
  `_EXTRACTORS` / `_BUILDERS` / `_APPLIERS` dicts; symmetry checked
  at import time. Caller-misuse cases (missing operator/reducer/
  names) raise `ValueError`, distinct from `RustBackendUnsupported`.
- New parity test `parity/scripts/test_channel_translate.py` —
  35 tests covering bridge contract, custom-class rejection,
  caller-misuse errors, and per-class round-trip semantics for all
  10 channels.
- Per-class encoding table documented inline (Rust module docstring)
  and traced back to handoff doc.
- Workspace dep: `rmp-serde = "1"` for serde-style msgpack on the
  Rust side. Python uses `ormsgpack` (already in bridge venv).

Wire format
-----------
msgpack bytes — matches the locked architectural decision in
`phase-1-followups.md` entry langchain-ai#3 §6. Python packs with `ormsgpack`,
Rust decodes via `rmp_serde` into `serde_json::Value`. Same encoding
family the rest of the project uses for checkpoint blobs; future
expansion to ext-coded values (LangChain messages, etc.) layers on
without changing the bridge surface.

Parity gate
-----------
For each channel class, drive the round-trip
  Python.checkpoint() -> msgpack -> Rust.from_checkpoint() ->
  Rust.checkpoint() -> msgpack -> Python.from_checkpoint() -> .get()
and assert the seeded Python channel observes the same state as the
original. For `DeltaChannel` snapshot blobs, the Rust side collapses
to sentinel (matching Python's invariant); we verify the replay
target instead.

What the gate caught
--------------------
- Python's `DeltaChannel.from_checkpoint(MISSING)` is asymmetric:
  it sets `value = typ()` rather than leaving the channel MISSING.
  Test `test_missing_round_trips_missing` documents the asymmetry
  with a `fresh.get() == {}` assertion.
- Subclasses of stdlib channels need a distinct error path from
  fully-custom channels: the runner ought to know "we know the
  parent shape but you customised it" vs "we have no idea what this
  is". `class_name` walks the supported-class MRO to give a precise
  message.
- Caller misuse (missing `operator` / `reducer` / `names` in
  `init_args`) is `ValueError`, not `RustBackendUnsupported`. Two
  failure modes wearing one exception turns 5-minute debugs into
  30-minute ones.

Test counts
-----------
- Cargo workspace: 211 passed (was 210; +1 invalid_msgpack test).
  Clippy clean.
- Phase 0: 73/73 round-trip, 49 allowlist, strict reject; 58
  conformance.
- Phase 1 + 1.2b foundation + #4c: 104 passed (was 69; +35 channel
  translate tests).
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…ler (V0.1 scope)

Minimum-viable Rust `StateGraph` builder that compiles to a runnable
`PregelLoop`. Sub-task langchain-ai#5 of the combined Step 1.2b + 1.3 milestone
(`rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`); satisfies the original Step
1.3 sub-gate (5 fixture graphs trace-equal vs Python `StateGraph`).

The full Python `StateGraph` is 1833 lines + branch helpers; the V0.1
port is sharply scoped to what the 5-fixture sub-gate needs and what
the 87-test gate (langchain-ai#7) ultimately requires from the compiler. Everything
beyond that is documented as deferred to follow-ups so langchain-ai#5 doesn't drag
features the runner monkeypatch (langchain-ai#6) doesn't need.

What landed
-----------
- New `crates/langgraph-core/src/state_graph/mod.rs`:
  - `StateGraph::new(channels)` (explicit channel map; no
    `Annotated[T, reducer]` schema inference).
  - `add_node`, `add_edge`, `add_conditional_edges`,
    `set_entry_point`, `set_finish_point`, `compile`.
  - `compile()` lowers to a `PregelLoop` by generating synthetic
    `branch:to:NODE` `LastValue<Value>` trigger channels for every
    incoming-edge target. User node callables are wrapped to emit
    sentinel writes for direct outgoing edges + conditional-branch
    resolutions after the user's state-channel writes.
  - `START` / `END` constants. `BRANCH_PREFIX` reserved namespace
    (compile rejects collisions). `START -> node` edges return the
    corresponding synthetic input channel via `CompiledGraph.input_channels`
    so the caller knows what to put_input.
  - 9 cargo unit tests covering compile validation + linear chain +
    conditional fork + fan-out + branch error path.
- New `rust/ffi/langgraph-py/src/state_graph_fixtures.rs` — bridge
  module that builds the 5 fixture graphs (linear_chain,
  conditional_fork, fan_out, conditional_join, recursion) via the new
  `StateGraph` builder. New PyO3 entry point
  `run_state_graph_fixture(name, init_json) -> trace_json`.
- New `parity/scripts/test_state_graph_via_bridge.py` — 25 tests
  driving each fixture against the upstream Python `StateGraph` and
  comparing user-visible state + node execution sequence.

Out of scope (V0.1, deferred to follow-ups)
-------------------------------------------
- Schema inference from `Annotated[T, reducer]`. Caller passes a
  `dyn ChannelKind` map directly. Rationale: Rust has no runtime
  reflection over `Annotated`-style metadata; bringing that surface
  in is a Step 4.5-style concern (Phase 0 follow-up langchain-ai#2).
- Subgraphs. `add_node` does not accept a nested `CompiledGraph`.
- `defer=True` deferred nodes.
- Async-only nodes / `astream`. Sub-step langchain-ai#6 owns the async
  monkeypatch path.
- Runtime context object.
- `add_sequence` (chains of nodes).
- Node return-value coercion. Rust nodes return explicit
  `Vec<Write>`; Python's "return dict → infer state writes" is
  handled at the runner boundary in langchain-ai#6.

Parity gate
-----------
For each of the 5 fixtures: build the same logical graph with Rust
`StateGraph` AND Python `StateGraph`, drive with the same input,
compare:
  * user-visible state-channel final values (must match);
  * node execution order (must match for deterministic graphs);
  * for parallel branches (fan_out, conditional_join), the *set* of
    nodes that fired per superstep (parallel ordering canonicalised by
    Pregel).

What the gate caught
--------------------
- `recursion` final counter matches Python; total `step` fire count
  is documented-divergent: Python's `add_conditional_edges` evaluates
  the branch on POST-write state while the V0.1 Rust builder evaluates
  on PRE-write state. Same divergence as the Step 1.2a hand-rolled
  recursion fixture; final-state parity is the actual claim.
- Branch path-map keys must match the resolved-key lookup. An unknown
  key surfaces as `PregelError::NodeFailed { node, message }` (the
  same path Python exception classes use in #4b).
- `thiserror` magic-treats fields named `source` as `#[source]` —
  caught at compile time, renamed to `node`.
- `PregelLoop` / `CompiledGraph` need explicit `Debug` (the bridge
  fields are PyO3-flavoured and don't auto-derive). Manual impl on
  `CompiledGraph` keeps the public surface usable from `unwrap_err()`
  in tests.

Test counts
-----------
- Cargo workspace: 220 passed (was 211; +9 state_graph unit tests).
  Clippy clean.
- Phase 0: 73/73 + 49 + strict reject; 58 conformance.
- Phase 1 + 1.2b foundation + #4c + langchain-ai#5: 129 passed (was 104; +25
  StateGraph parity tests).
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…doff for langchain-ai#6/langchain-ai#7

Milestone update for the combined Step 1.2b + 1.3 final stretch.

Locked architectural decision (2026-05-06)
------------------------------------------
The original plan §6 and `phase-1-followups.md` entry langchain-ai#3 §5 left the door
open to "a tighter cut decided in implementation" for the
`langgraph_rs.backend` monkeypatch — i.e., replacing only `tick()`
(approach B) or only `_algo.apply_writes` + `prepare_next_tasks`
(approach C) instead of the full `SyncPregelLoop` (approach A).

The user has explicitly chosen approach A: full `SyncPregelLoop`
replacement. Reasoning captured in the new handoff doc:

  * A is the only approach where `LANGGRAPH_BACKEND=rust` actually
    means "Rust drives the loop" — B and C still leave Python
    orchestrating most per-tick work.
  * B's per-tick re-sync of channel state is wasteful and adds an
    extra parity surface that's correctness risk we don't need.
  * C is essentially a third copy of `test_pregel_differential.py`'s
    coverage — buys us nothing new.
  * "Done right the first time" — the full replacement is bigger but
    architecturally honest; a tighter cut is technical debt that
    would need to be redone before Step 1.4 streaming or Phase 2.

The "re-build from checkpoint each tick" guidance from the original
phase-1-followups langchain-ai#3 §6 is also superseded: under approach A, Rust
state is constructed once at `__enter__` (Python → Rust via
`_channel_translate.extract_state`) and applied once at `__exit__`
(Rust → Python via `apply_state`). No per-tick re-sync.

What this commit changes
------------------------
- New `rust/docs/STEP-1.2B-FINAL-HANDOFF.md`: the comprehensive
  handoff brief for the next session picking up langchain-ai#6 and langchain-ai#7. Covers:

  * Where we are (status table through `c03c7ac6`).
  * Locked architectural decision (approach A).
  * langchain-ai#6 sub-step breakdown (#6a Maturin layout switch → #6b
    backend.py monkeypatch → #6c Pregel runtime bridge entry point).
  * langchain-ai#7 iteration loop (87-test gate).
  * `__init__.py` re-export shim contents (drop-in for the layout
    switch).
  * Replaced symbols list pattern for `backend.py`.
  * `RustBackendUnsupported` rejection sites for the 4 deliberately
    out-of-scope feature families (custom channels, subgraphs, Send,
    interrupts, stream modes outside values/updates).
  * Verification block, hard rules, bridge install gotcha,
    lessons-learned forwarding from prior handoffs.

- `rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`: prepended a
  SUPERSEDED notice pointing at the new final handoff for langchain-ai#6/langchain-ai#7. The
  partial-handoff content is preserved as historical context for what
  shipped in #4c and langchain-ai#5.

- `rust/docs/phase-1-followups.md` entry langchain-ai#3 §5 + §6: amended to
  record the approach A decision and the supersession of the
  per-tick-resync line.

- `.omc/plans/langgraph-rust-port-2026-04-30.md` §6 Step 1.2b+1.3
  Locked decisions §4 + §5: same amendments, with a pointer to the
  final handoff doc.

What's not changing
-------------------
The 5 hard architectural decisions in §6 ("combined milestone",
"async runtime: pyo3-async-runtimes", "GIL discipline", "errors via
PregelExecutionError::NodeFailed", "channel translation by class
name") remain locked. Approach A is the runtime-shape decision that
sits *above* those.

Test counts
-----------
Unchanged — pure docs commit. Latest baseline (HEAD = `c03c7ac6`):
  * Cargo: 220 passed, clippy clean.
  * Phase 0: 73/73 + 49 + strict reject; 58 conformance.
  * Phase 1 + 1.2b foundation + #4c + langchain-ai#5: 129 passed.
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…nly gate green

Closes the combined Step 1.2b + 1.3 milestone. The
``LANGGRAPH_BACKEND=rust`` filter on
``libs/langgraph/tests/test_pregel.py`` matches **81 tests** (the
handoff's "87" estimate was written before the test set drifted; the
``-k "memory and not streaming and not interrupt and not subgraph and
not send"`` filter is verbatim). All 81 pass on first run after
sub-step #6c landed — no triage iteration was needed.

What landed
-----------
- ``parity/scripts/run_87_test_gate.sh`` — runnable wrapper that sets
  ``NO_DOCKER=true`` (skips redis/postgres fixtures the bridge venv
  doesn't carry) and ``LANGGRAPH_BACKEND=rust``, points pytest at the
  filter, and forwards extra args. Single command for re-running
  the gate locally.
- ``rust/ffi/langgraph-py/pyproject.toml`` — added ``[gate-87]``
  dependency-group capturing the four collection-time deps the
  upstream conftest pulls in (``redis``, ``pytest-mock``, ``syrupy``,
  ``pycryptodome``). The bridge venv was missing these because the
  set is what ``libs/langgraph/.venv`` carries for its own test
  suite, not what the bridge needs for codec parity. Documenting in
  the dependency-group keeps the install command self-describing
  (``uv pip install --group gate-87``).
- ``rust/docs/phase-1-followups.md`` — entry langchain-ai#3 (async PyO3 bridge
  + ``LANGGRAPH_BACKEND=rust`` wiring) marked **closed**. Added the
  amendment note that the implementation chose subclass + override
  for ``_RustSyncPregelLoop`` (rather than the literal stand-alone
  duck-typed shadow class the prose example sketched), with the
  rationale matching the design discussion at the start of #6b.
  The async surface stays deferred — ``_RustAsyncPregelLoop``
  raises at ``__aenter__`` until a phase that needs streaming /
  ``astream`` parity owns it.

Bridge-venv setup deltas (one-time, since this commit)
------------------------------------------------------
- ``redis``, ``pytest-mock``, ``syrupy``, ``pycryptodome`` installed
  via the new ``gate-87`` dependency group.
- ``libs/checkpoint-sqlite`` and ``libs/checkpoint-postgres``
  installed in editable mode so the conftest can import
  ``langgraph.cache.sqlite``. (The other libs were already
  editable-installed by Phase 0.)

Parity gate (the milestone gate)
--------------------------------
::

    NO_DOCKER=true LANGGRAPH_BACKEND=rust \
        rust/ffi/langgraph-py/.venv/bin/python -m pytest \
        libs/langgraph/tests/test_pregel.py \
        -k "memory and not streaming and not interrupt and not subgraph and not send"

Result: ``81 passed, 376 deselected in 9.98s``. Sanity check
confirmed the Rust runtime is genuinely driving the loop (not a
silent fallback to upstream Python): instrumenting
``run_pregel_loop_topology`` with a call counter shows it's
invoked on every ``graph.invoke`` under ``LANGGRAPH_BACKEND=rust``,
both ``langgraph.pregel._loop`` and ``langgraph.pregel.main``
namespaces resolve ``SyncPregelLoop`` to ``_RustSyncPregelLoop``,
and ``backend.is_active()`` returns ``True``.

What the gate caught
--------------------
Nothing. The 81 tests passed on first run after the bridge wheel
was rebuilt with #6c and the bridge venv had its collection-time
deps installed. The handoff explicitly warned to "expect failures
to send you back to #4c / langchain-ai#5 / langchain-ai#6 for incremental fixes"; that
budget went unused. Plausible reasons:

1. The four channel-translation rejection sites
   (``CONFIG_KEY_READ`` shadow, panic-stub binop, custom-channel
   gate, async ``__aenter__``) cleanly cover the corners that
   would have been the most likely failure surfaces. The 87-test
   filter ``-k`` excludes the patterns those rejections would
   trip on (``streaming``, ``interrupt``, ``subgraph``, ``send``).
2. The local-state shadow ``CONFIG_KEY_READ`` reader the wrapper
   provides is enough for every conditional-edge test in the
   filter — none of them read channels the routing node didn't
   write.
3. The translation surface from sub-step #4c (1,200+ hypothesis
   iterations across the 10 stdlib channel classes) was already
   verified, so the per-class state encoding round-trips cleanly
   under load.

Test counts
-----------
- Cargo workspace: 220 passed; clippy clean.
- Phase 0: 73/73 corpus + 49 allowlist + strict reject;
  58 conformance pass / 0 fail.
- Phase 1 + 1.2b foundation + #4c + langchain-ai#5 + #6a + #6b + #6c
  (LANGGRAPH_BACKEND unset): 141 passed.
- **Combined Step 1.2b + 1.3 milestone gate (LANGGRAPH_BACKEND=rust):
  81 passed / 0 failed.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant