Implement batch() as a single computation with multiple inputs by nfcampos · Pull Request #6 · langchain-ai/langgraph

Nuno Campos (nfcampos) · 2023-09-19T14:56:13Z

Failing tests, probably wrong approach

… value at a time

…angchain-ai#7 Comprehensive handoff for a fresh-context agent picking up the combined Step 1.2b + 1.3 milestone partway through. Covers: - Where we are (4 foundation commits landed: 720f5b0, 5325311, 3339ea4, 39206f3). - What each foundation commit delivered (architectural surface, parity gates). - Verification block (181 cargo, clippy clean, 73/73 + 49 + reject, 58/58 conformance, 69 parity-gate tests). - Sub-task plan for the remaining four (#4c channel translation, langchain-ai#5 StateGraph compiler, langchain-ai#6 langgraph_rs.backend monkeypatch, langchain-ai#7 87-test parity gate green). - Lessons learned this session: Python::with_gil → Python::attach in PyO3 0.28; PyAnyMethods::downcast deprecation; uv pip install needs VIRTUAL_ENV explicit; background maturin build can race edits; pytest-asyncio not in bridge venv (use anyio); adding Op variants requires updating all hand-coded matches; clippy 1.95 is_multiple_of + collapsed-if-let-chain lints; maturin python-source switch needed for langchain-ai#6; PyErr stash side-channel pattern for cross-Rust exception class preservation; json round- trip is fast enough for value translation. - Open follow-ups snapshot. - Sub-task tracking table. Companion to STEP-1.2A-HANDOFF.md and SESSION-RESUME.md — read all three on resume.

…k bridge) Round-trips Python `BaseChannel.checkpoint()` state through Rust `from_checkpoint` for every stdlib channel class. Closes the channel- translation deliverable for the combined Step 1.2b + 1.3 milestone (`rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`). The runner monkeypatch in sub-step langchain-ai#6 will use `extract_state` / `apply_state` to wire Python channel instances through the Rust loop tick. What landed ----------- - New `rust/ffi/langgraph-py/src/channel_translate.rs` — Rust translation gate. Per-class `round_trip_*` functions parse the msgpack-encoded state, build a Rust channel via `from_checkpoint`, and re-encode the result. 10 stdlib classes covered (LastValue, LastValueAfterFinish, Topic, BinaryOperatorAggregate, EphemeralValue, AnyValue, UntrackedValue, NamedBarrierValue, NamedBarrierValueAfterFinish, DeltaChannel). Custom user-defined channels return `ValueError` (Python: `RustBackendUnsupported`). - New PyO3 entry points `translate_channel_round_trip` (msgpack bytes in, msgpack bytes out) and `supported_channel_classes`. - New Python helper `parity/scripts/_channel_translate.py`: `class_name`, `extract_state`, `apply_state`, `pack_state`, `unpack_state`, `RustBackendUnsupported`. Per-class dispatch via `_EXTRACTORS` / `_BUILDERS` / `_APPLIERS` dicts; symmetry checked at import time. Caller-misuse cases (missing operator/reducer/ names) raise `ValueError`, distinct from `RustBackendUnsupported`. - New parity test `parity/scripts/test_channel_translate.py` — 35 tests covering bridge contract, custom-class rejection, caller-misuse errors, and per-class round-trip semantics for all 10 channels. - Per-class encoding table documented inline (Rust module docstring) and traced back to handoff doc. - Workspace dep: `rmp-serde = "1"` for serde-style msgpack on the Rust side. Python uses `ormsgpack` (already in bridge venv). Wire format ----------- msgpack bytes — matches the locked architectural decision in `phase-1-followups.md` entry langchain-ai#3 §6. Python packs with `ormsgpack`, Rust decodes via `rmp_serde` into `serde_json::Value`. Same encoding family the rest of the project uses for checkpoint blobs; future expansion to ext-coded values (LangChain messages, etc.) layers on without changing the bridge surface. Parity gate ----------- For each channel class, drive the round-trip Python.checkpoint() -> msgpack -> Rust.from_checkpoint() -> Rust.checkpoint() -> msgpack -> Python.from_checkpoint() -> .get() and assert the seeded Python channel observes the same state as the original. For `DeltaChannel` snapshot blobs, the Rust side collapses to sentinel (matching Python's invariant); we verify the replay target instead. What the gate caught -------------------- - Python's `DeltaChannel.from_checkpoint(MISSING)` is asymmetric: it sets `value = typ()` rather than leaving the channel MISSING. Test `test_missing_round_trips_missing` documents the asymmetry with a `fresh.get() == {}` assertion. - Subclasses of stdlib channels need a distinct error path from fully-custom channels: the runner ought to know "we know the parent shape but you customised it" vs "we have no idea what this is". `class_name` walks the supported-class MRO to give a precise message. - Caller misuse (missing `operator` / `reducer` / `names` in `init_args`) is `ValueError`, not `RustBackendUnsupported`. Two failure modes wearing one exception turns 5-minute debugs into 30-minute ones. Test counts ----------- - Cargo workspace: 211 passed (was 210; +1 invalid_msgpack test). Clippy clean. - Phase 0: 73/73 round-trip, 49 allowlist, strict reject; 58 conformance. - Phase 1 + 1.2b foundation + #4c: 104 passed (was 69; +35 channel translate tests).

…ler (V0.1 scope) Minimum-viable Rust `StateGraph` builder that compiles to a runnable `PregelLoop`. Sub-task langchain-ai#5 of the combined Step 1.2b + 1.3 milestone (`rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`); satisfies the original Step 1.3 sub-gate (5 fixture graphs trace-equal vs Python `StateGraph`). The full Python `StateGraph` is 1833 lines + branch helpers; the V0.1 port is sharply scoped to what the 5-fixture sub-gate needs and what the 87-test gate (langchain-ai#7) ultimately requires from the compiler. Everything beyond that is documented as deferred to follow-ups so langchain-ai#5 doesn't drag features the runner monkeypatch (langchain-ai#6) doesn't need. What landed ----------- - New `crates/langgraph-core/src/state_graph/mod.rs`: - `StateGraph::new(channels)` (explicit channel map; no `Annotated[T, reducer]` schema inference). - `add_node`, `add_edge`, `add_conditional_edges`, `set_entry_point`, `set_finish_point`, `compile`. - `compile()` lowers to a `PregelLoop` by generating synthetic `branch:to:NODE` `LastValue<Value>` trigger channels for every incoming-edge target. User node callables are wrapped to emit sentinel writes for direct outgoing edges + conditional-branch resolutions after the user's state-channel writes. - `START` / `END` constants. `BRANCH_PREFIX` reserved namespace (compile rejects collisions). `START -> node` edges return the corresponding synthetic input channel via `CompiledGraph.input_channels` so the caller knows what to put_input. - 9 cargo unit tests covering compile validation + linear chain + conditional fork + fan-out + branch error path. - New `rust/ffi/langgraph-py/src/state_graph_fixtures.rs` — bridge module that builds the 5 fixture graphs (linear_chain, conditional_fork, fan_out, conditional_join, recursion) via the new `StateGraph` builder. New PyO3 entry point `run_state_graph_fixture(name, init_json) -> trace_json`. - New `parity/scripts/test_state_graph_via_bridge.py` — 25 tests driving each fixture against the upstream Python `StateGraph` and comparing user-visible state + node execution sequence. Out of scope (V0.1, deferred to follow-ups) ------------------------------------------- - Schema inference from `Annotated[T, reducer]`. Caller passes a `dyn ChannelKind` map directly. Rationale: Rust has no runtime reflection over `Annotated`-style metadata; bringing that surface in is a Step 4.5-style concern (Phase 0 follow-up langchain-ai#2). - Subgraphs. `add_node` does not accept a nested `CompiledGraph`. - `defer=True` deferred nodes. - Async-only nodes / `astream`. Sub-step langchain-ai#6 owns the async monkeypatch path. - Runtime context object. - `add_sequence` (chains of nodes). - Node return-value coercion. Rust nodes return explicit `Vec<Write>`; Python's "return dict → infer state writes" is handled at the runner boundary in langchain-ai#6. Parity gate ----------- For each of the 5 fixtures: build the same logical graph with Rust `StateGraph` AND Python `StateGraph`, drive with the same input, compare: * user-visible state-channel final values (must match); * node execution order (must match for deterministic graphs); * for parallel branches (fan_out, conditional_join), the *set* of nodes that fired per superstep (parallel ordering canonicalised by Pregel). What the gate caught -------------------- - `recursion` final counter matches Python; total `step` fire count is documented-divergent: Python's `add_conditional_edges` evaluates the branch on POST-write state while the V0.1 Rust builder evaluates on PRE-write state. Same divergence as the Step 1.2a hand-rolled recursion fixture; final-state parity is the actual claim. - Branch path-map keys must match the resolved-key lookup. An unknown key surfaces as `PregelError::NodeFailed { node, message }` (the same path Python exception classes use in #4b). - `thiserror` magic-treats fields named `source` as `#[source]` — caught at compile time, renamed to `node`. - `PregelLoop` / `CompiledGraph` need explicit `Debug` (the bridge fields are PyO3-flavoured and don't auto-derive). Manual impl on `CompiledGraph` keeps the public surface usable from `unwrap_err()` in tests. Test counts ----------- - Cargo workspace: 220 passed (was 211; +9 state_graph unit tests). Clippy clean. - Phase 0: 73/73 + 49 + strict reject; 58 conformance. - Phase 1 + 1.2b foundation + #4c + langchain-ai#5: 129 passed (was 104; +25 StateGraph parity tests).

…doff for langchain-ai#6/langchain-ai#7 Milestone update for the combined Step 1.2b + 1.3 final stretch. Locked architectural decision (2026-05-06) ------------------------------------------ The original plan §6 and `phase-1-followups.md` entry langchain-ai#3 §5 left the door open to "a tighter cut decided in implementation" for the `langgraph_rs.backend` monkeypatch — i.e., replacing only `tick()` (approach B) or only `_algo.apply_writes` + `prepare_next_tasks` (approach C) instead of the full `SyncPregelLoop` (approach A). The user has explicitly chosen approach A: full `SyncPregelLoop` replacement. Reasoning captured in the new handoff doc: * A is the only approach where `LANGGRAPH_BACKEND=rust` actually means "Rust drives the loop" — B and C still leave Python orchestrating most per-tick work. * B's per-tick re-sync of channel state is wasteful and adds an extra parity surface that's correctness risk we don't need. * C is essentially a third copy of `test_pregel_differential.py`'s coverage — buys us nothing new. * "Done right the first time" — the full replacement is bigger but architecturally honest; a tighter cut is technical debt that would need to be redone before Step 1.4 streaming or Phase 2. The "re-build from checkpoint each tick" guidance from the original phase-1-followups langchain-ai#3 §6 is also superseded: under approach A, Rust state is constructed once at `__enter__` (Python → Rust via `_channel_translate.extract_state`) and applied once at `__exit__` (Rust → Python via `apply_state`). No per-tick re-sync. What this commit changes ------------------------ - New `rust/docs/STEP-1.2B-FINAL-HANDOFF.md`: the comprehensive handoff brief for the next session picking up langchain-ai#6 and langchain-ai#7. Covers: * Where we are (status table through `c03c7ac6`). * Locked architectural decision (approach A). * langchain-ai#6 sub-step breakdown (#6a Maturin layout switch → #6b backend.py monkeypatch → #6c Pregel runtime bridge entry point). * langchain-ai#7 iteration loop (87-test gate). * `__init__.py` re-export shim contents (drop-in for the layout switch). * Replaced symbols list pattern for `backend.py`. * `RustBackendUnsupported` rejection sites for the 4 deliberately out-of-scope feature families (custom channels, subgraphs, Send, interrupts, stream modes outside values/updates). * Verification block, hard rules, bridge install gotcha, lessons-learned forwarding from prior handoffs. - `rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`: prepended a SUPERSEDED notice pointing at the new final handoff for langchain-ai#6/langchain-ai#7. The partial-handoff content is preserved as historical context for what shipped in #4c and langchain-ai#5. - `rust/docs/phase-1-followups.md` entry langchain-ai#3 §5 + §6: amended to record the approach A decision and the supersession of the per-tick-resync line. - `.omc/plans/langgraph-rust-port-2026-04-30.md` §6 Step 1.2b+1.3 Locked decisions §4 + §5: same amendments, with a pointer to the final handoff doc. What's not changing ------------------- The 5 hard architectural decisions in §6 ("combined milestone", "async runtime: pyo3-async-runtimes", "GIL discipline", "errors via PregelExecutionError::NodeFailed", "channel translation by class name") remain locked. Approach A is the runtime-shape decision that sits *above* those. Test counts ----------- Unchanged — pure docs commit. Latest baseline (HEAD = `c03c7ac6`): * Cargo: 220 passed, clippy clean. * Phase 0: 73/73 + 49 + strict reject; 58 conformance. * Phase 1 + 1.2b foundation + #4c + langchain-ai#5: 129 passed.

…nly gate green Closes the combined Step 1.2b + 1.3 milestone. The ``LANGGRAPH_BACKEND=rust`` filter on ``libs/langgraph/tests/test_pregel.py`` matches **81 tests** (the handoff's "87" estimate was written before the test set drifted; the ``-k "memory and not streaming and not interrupt and not subgraph and not send"`` filter is verbatim). All 81 pass on first run after sub-step #6c landed — no triage iteration was needed. What landed ----------- - ``parity/scripts/run_87_test_gate.sh`` — runnable wrapper that sets ``NO_DOCKER=true`` (skips redis/postgres fixtures the bridge venv doesn't carry) and ``LANGGRAPH_BACKEND=rust``, points pytest at the filter, and forwards extra args. Single command for re-running the gate locally. - ``rust/ffi/langgraph-py/pyproject.toml`` — added ``[gate-87]`` dependency-group capturing the four collection-time deps the upstream conftest pulls in (``redis``, ``pytest-mock``, ``syrupy``, ``pycryptodome``). The bridge venv was missing these because the set is what ``libs/langgraph/.venv`` carries for its own test suite, not what the bridge needs for codec parity. Documenting in the dependency-group keeps the install command self-describing (``uv pip install --group gate-87``). - ``rust/docs/phase-1-followups.md`` — entry langchain-ai#3 (async PyO3 bridge + ``LANGGRAPH_BACKEND=rust`` wiring) marked **closed**. Added the amendment note that the implementation chose subclass + override for ``_RustSyncPregelLoop`` (rather than the literal stand-alone duck-typed shadow class the prose example sketched), with the rationale matching the design discussion at the start of #6b. The async surface stays deferred — ``_RustAsyncPregelLoop`` raises at ``__aenter__`` until a phase that needs streaming / ``astream`` parity owns it. Bridge-venv setup deltas (one-time, since this commit) ------------------------------------------------------ - ``redis``, ``pytest-mock``, ``syrupy``, ``pycryptodome`` installed via the new ``gate-87`` dependency group. - ``libs/checkpoint-sqlite`` and ``libs/checkpoint-postgres`` installed in editable mode so the conftest can import ``langgraph.cache.sqlite``. (The other libs were already editable-installed by Phase 0.) Parity gate (the milestone gate) -------------------------------- :: NO_DOCKER=true LANGGRAPH_BACKEND=rust \ rust/ffi/langgraph-py/.venv/bin/python -m pytest \ libs/langgraph/tests/test_pregel.py \ -k "memory and not streaming and not interrupt and not subgraph and not send" Result: ``81 passed, 376 deselected in 9.98s``. Sanity check confirmed the Rust runtime is genuinely driving the loop (not a silent fallback to upstream Python): instrumenting ``run_pregel_loop_topology`` with a call counter shows it's invoked on every ``graph.invoke`` under ``LANGGRAPH_BACKEND=rust``, both ``langgraph.pregel._loop`` and ``langgraph.pregel.main`` namespaces resolve ``SyncPregelLoop`` to ``_RustSyncPregelLoop``, and ``backend.is_active()`` returns ``True``. What the gate caught -------------------- Nothing. The 81 tests passed on first run after the bridge wheel was rebuilt with #6c and the bridge venv had its collection-time deps installed. The handoff explicitly warned to "expect failures to send you back to #4c / langchain-ai#5 / langchain-ai#6 for incremental fixes"; that budget went unused. Plausible reasons: 1. The four channel-translation rejection sites (``CONFIG_KEY_READ`` shadow, panic-stub binop, custom-channel gate, async ``__aenter__``) cleanly cover the corners that would have been the most likely failure surfaces. The 87-test filter ``-k`` excludes the patterns those rejections would trip on (``streaming``, ``interrupt``, ``subgraph``, ``send``). 2. The local-state shadow ``CONFIG_KEY_READ`` reader the wrapper provides is enough for every conditional-edge test in the filter — none of them read channels the routing node didn't write. 3. The translation surface from sub-step #4c (1,200+ hypothesis iterations across the 10 stdlib channel classes) was already verified, so the per-class state encoding round-trips cleanly under load. Test counts ----------- - Cargo workspace: 220 passed; clippy clean. - Phase 0: 73/73 corpus + 49 allowlist + strict reject; 58 conformance pass / 0 fail. - Phase 1 + 1.2b foundation + #4c + langchain-ai#5 + #6a + #6b + #6c (LANGGRAPH_BACKEND unset): 141 passed. - **Combined Step 1.2b + 1.3 milestone gate (LANGGRAPH_BACKEND=rust): 81 passed / 0 failed.**

Nuno Campos (nfcampos) marked this pull request as draft October 5, 2023 08:55

Base automatically changed from nc/single-out-message to main October 5, 2023 09:00

Nuno Campos (nfcampos) added 4 commits October 5, 2023 10:36

Implement batch() as a single computation with multiple inputs

d363893

Each message derives from a single input

c19a56d

Simplify implementatin of _transform, which now handles only a single…

0372c37

… value at a time

Fix flaky test

6460c91

Nuno Campos (nfcampos) force-pushed the nc/combined-batch branch from 21fb493 to 6460c91 Compare October 5, 2023 09:42

More failing tests

907b0bb

Nuno Campos (nfcampos) closed this Oct 13, 2023

Nuno Campos (nfcampos) deleted the nc/combined-batch branch August 26, 2024 22:30

Khuyagbaatar Batsuren (kbatsuren) mentioned this pull request Aug 27, 2024

Langgraph requires REDIS_URI which isn't used in my project and the project was running fine previously. #1489

Closed

5 tasks

rd-rugg mentioned this pull request Dec 20, 2024

Langgraph dev (cli) error with library #2841

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement batch() as a single computation with multiple inputs#6

Implement batch() as a single computation with multiple inputs#6
Nuno Campos (nfcampos) wants to merge 5 commits intomainfrom
nc/combined-batch

Nuno Campos (nfcampos) commented Sep 19, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Nuno Campos (nfcampos) commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Nuno Campos (nfcampos) commented Sep 19, 2023 •

edited

Loading