Skip to content

Add cycle monitoring#3

Closed
Nuno Campos (nfcampos) wants to merge 2 commits intomainfrom
nc/cycle-monitor
Closed

Add cycle monitoring#3
Nuno Campos (nfcampos) wants to merge 2 commits intomainfrom
nc/cycle-monitor

Conversation

@nfcampos
Copy link
Copy Markdown
Contributor

Monitor for cycles (ie. infinite loops) in the computation graph. Note this cannot be done as a preprocessing step, as one of the design goals is to allow cyclical computations, we just want to limit these cycles to break infinite loops

@nfcampos Nuno Campos (nfcampos) deleted the nc/cycle-monitor branch August 26, 2024 22:30
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
Splits Phase 1 §6 Step 1.2 into 1.2a (Pregel core, differential parity
on synthetic fixtures) and 1.2b (LANGGRAPH_BACKEND=rust + async PyO3
bridge). 1.2a closes Phase 1 follow-up langchain-ai#2 (erased dyn ChannelKind);
1.2b takes over the new follow-up langchain-ai#3 (async bridge + LANGGRAPH_BACKEND).

What lands:

  - langgraph-core::pregel — algorithmic core only (no streaming, no
    interrupts, no subgraphs, no Send, no managed values, no callbacks).
    Modules:
      * channel_kind: erased dyn ChannelKind trait + impls for all 9
        Step 1.1 channels, going through serde_json at the boundary.
      * apply_writes: parity port of Python _algo.apply_writes
        (versions_seen, consume, group writes, bump-step idle pass,
        finish-pass on tentatively-last superstep).
      * prepare_tasks: PULL path of prepare_next_tasks. Task ids match
        Python's xxh3-128 byte-for-byte (xxhash-rust, UUID-formatted).
      * loop_: PregelLoop::new + with_versioning + with_parent_ns +
        with_stop builders, tick(), run(), TraceEntry/TraceTask shape
        for the parity comparison.
      * checkpoint_helpers: empty_checkpoint() + checkpoint_id_to_bytes()
        with proper InvalidCheckpointId error variant.
    All submodules private; pub use surface listed in mod.rs.

  - langgraph-py bridge: run_pregel_fixture(name, init_json,
    max_steps=None) entry point with 5 hand-rolled fixtures
    (linear_chain, conditional_fork, fan_out, recursion,
    multi_channel_reducer_mix).

  - parity/scripts/test_pregel_fixtures_via_bridge.py: 24 parametrised
    tests vs Python StateGraph, user-channel parity.

  - parity/scripts/test_pregel_differential.py: 5 hypothesis tests,
    ~1300 random iterations, 0 divergences.

Step + recursion-limit are i64 (matches Python's signed int wire type;
1.2b checkpoint interop needs the negative one-step "before any input"
state). PregelLoop fields are pub(crate); external mutation goes
through put_input/tick/run + the with_* builders. The pregel layer's
runtime channel value type is named ChannelValue (distinct from the
codec's wire-format Decoded — Phase 0 follow-up langchain-ai#2 plans to migrate
Checkpoint::channel_values to codec Decoded later).

Verification: 161 cargo tests, clippy clean, 24+5 fixture/differential
parity tests, Phase 0 round-trip + conformance still all-green.
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
Closes Phase 0 follow-up #1's tracked-by check ("at least one
#[pyfunction(async)] exposed in the bridge, plus a Python async test
that calls it"). First piece of the combined Step 1.2b + 1.3
milestone — see plan amendment in this commit and the locked
architectural decisions in rust/docs/phase-1-followups.md entry langchain-ai#3.

What lands:

  - rust/Cargo.toml: workspace deps for pyo3 0.28, pyo3-async-runtimes
    0.28, tokio 1 (rt-multi-thread, macros, sync, time). Bridge
    Cargo.toml switches to workspace-versioned pyo3.
  - rust/ffi/langgraph-py/src/async_runtime.rs: new module with
    `async_echo(s) -> awaitable[str]`. Sleeps 1ms on tokio then
    returns s.upper() — a wiring proof, not a feature. Replaced by
    real entry points (run_pregel_async, …) as Step 1.2b's surface
    grows; kept around because its asyncio test is the cheapest
    smoke check that the wiring still works after future bridge
    changes.
  - parity/scripts/test_async_bridge.py: 3 tests (await, gather,
    attribute presence). Uses anyio (matching upstream
    libs/langgraph/tests/test_pregel.py — required for the eventual
    87-test parity gate to run on the same async framework).
  - .omc/plans/langgraph-rust-port-2026-04-30.md: amends §6 to mark
    Step 1.2b + 1.3 as one combined milestone with the locked
    architectural decisions (research-backed via the two
    research/*.md files in this commit).
  - rust/docs/phase-1-followups.md: entry #1 (DeltaChannel) and
    entry langchain-ai#3 (async bridge + LANGGRAPH_BACKEND wiring) re-pointed
    to the combined milestone, with explicit decision log.
  - research/pyo3-async-pregel-2026-05-05.md +
    python-rust-backend-swap-2026-05-05.md: pplx research outputs
    that ground the locked decisions (pyo3-async-runtimes 0.28 is
    the maintained answer; Pattern B monkeypatch from a separate
    package is canonical for accelerating frozen-baseline libs).

GIL discipline (production pattern from polars / pydantic-core /
datafusion-python): release between Python callbacks via short
Python::with_gil scopes. The Rust future itself runs without the
GIL. Wired through pyo3-async-runtimes::tokio::future_into_py.

Parity gate: 3 anyio tests via the bridge venv's Python directly
(SESSION-RESUME's bridge install gotcha applies — uv run python
silently reverts the .so).

What the gate caught: nothing — this is foundation. The first
test_async_echo_awaits_via_tokio failure on commit-day was
"pytest-asyncio not installed" → fixed by switching to anyio
(matches upstream langgraph), which uncovered nothing about the
Rust side.

Verification:
  - 161/161 cargo tests pass (unchanged from 1.2a baseline)
  - cargo clippy --workspace --all-targets -- -D warnings clean
  - 3/3 new asyncio tests pass
  - 73/73 Phase 0 corpus + 49 allowlist + strict-reject pass
  - 58/58 conformance via Rust shadow pass
  - 19/19 Step 1.1 channels parity (10 fixtures + 9 differential)
  - 29/29 Step 1.2a Pregel parity (24 fixtures + 5 hypothesis)
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
Plumbs `Result<Vec<Write>, NodeError>` through `NodeCallable`,
`PregelLoop::tick()`, and `PregelLoop::run()`. Adds
`PregelError::NodeFailed { node, message }` so Pregel can surface
node-execution failures with the failing node's name attached.

This is the foundation for Task #4b (Python-callable Node wrapper):
when a Python node raises, the bridge will catch the `PyErr`, stash
it in a side-channel registry keyed by node name, and return
`Err(NodeError)`; the Pregel loop propagates as `NodeFailed`; the
bridge driver re-raises the original Python exception. Side-channel
storage lives in the bridge crate so `langgraph-core` stays
PyO3-free.

What lands:

  - langgraph-core::pregel::errors:
    * `NodeError { message: String }` with std::error::Error +
      Display + From<&str>/From<String>. Public.
    * `PregelError::NodeFailed { node, message }` new variant.
    * Doc-comments updated to reflect the 1.2b ownership.
  - langgraph-core::pregel::node: `NodeCallable` type alias bumped
    to `Fn(&NodeInput) -> Result<Vec<Write>, NodeError>`. Doc
    explains the bridge-side PyErr capture path.
  - langgraph-core::pregel::loop_::tick(): node call site now does
    `callable(&input).map_err(|e| PregelError::NodeFailed {
    node: name, message: e.message })?`. Hot path unchanged for
    happy returns.
  - All 12 existing closure use-sites wrap their happy-path returns
    in `Ok(...)` — 4 in `loop_.rs` tests, 1 in `prepare_tasks.rs`
    (`noop_callable`), 5 in `ffi/langgraph-py/src/pregel.rs` fixture
    builders, 2 in shared `single_i64_node` / `router_node` helpers.
    `panic!` arms still type-check via `!` coercion.
  - `pregel/mod.rs` re-exports `NodeError`.
  - phase-1-followups.md: documents 2026-05-05 scope decision that
    Task langchain-ai#3 (channel translation harness) is subsumed by the
    existing differential gate + DeltaChannel parity test (the
    "translate to Rust → translate back → equal at every step"
    invariant is what differential per-step trace equality already
    proves). The Python-`BaseChannel` → Rust-`dyn ChannelKind` glue
    moves into the async runner where it has a runtime user.

New test: `node_callable_error_surfaces_as_node_failed` proves a
node returning `Err(NodeError::new("simulated python KeyError ..."))`
surfaces as `PregelError::NodeFailed { node: "boom", message: ... }`
through `PregelLoop::run()`.

What the gate caught: nothing — pure plumbing refactor, all 12
existing closures preserved their semantics. No behavior change for
happy paths; cargo + clippy clean across the workspace.

Verification:
  - 181/181 cargo tests pass (180 prior + 1 new error-path test)
  - cargo clippy --workspace --all-targets -- -D warnings clean
  - 29/29 Step 1.2a Pregel parity (24 fixtures + 5 hypothesis) — proves
    refactor preserves behavior on the 5 fixture graphs
  - 10/10 Step 1.2b DeltaChannel parity (unchanged)
  - 3/3 Step 1.2b async-bridge wiring (unchanged)
  - All Phase 0 + Step 1.1 gates remain green (not re-run inline,
    last verified at the prior 1.2b foundation commit 5325311)
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…k bridge)

Round-trips Python `BaseChannel.checkpoint()` state through Rust
`from_checkpoint` for every stdlib channel class. Closes the channel-
translation deliverable for the combined Step 1.2b + 1.3 milestone
(`rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`). The runner monkeypatch in
sub-step langchain-ai#6 will use `extract_state` / `apply_state` to wire Python
channel instances through the Rust loop tick.

What landed
-----------
- New `rust/ffi/langgraph-py/src/channel_translate.rs` — Rust
  translation gate. Per-class `round_trip_*` functions parse the
  msgpack-encoded state, build a Rust channel via `from_checkpoint`,
  and re-encode the result. 10 stdlib classes covered (LastValue,
  LastValueAfterFinish, Topic, BinaryOperatorAggregate,
  EphemeralValue, AnyValue, UntrackedValue, NamedBarrierValue,
  NamedBarrierValueAfterFinish, DeltaChannel). Custom user-defined
  channels return `ValueError` (Python: `RustBackendUnsupported`).
- New PyO3 entry points `translate_channel_round_trip` (msgpack
  bytes in, msgpack bytes out) and `supported_channel_classes`.
- New Python helper `parity/scripts/_channel_translate.py`:
  `class_name`, `extract_state`, `apply_state`, `pack_state`,
  `unpack_state`, `RustBackendUnsupported`. Per-class dispatch via
  `_EXTRACTORS` / `_BUILDERS` / `_APPLIERS` dicts; symmetry checked
  at import time. Caller-misuse cases (missing operator/reducer/
  names) raise `ValueError`, distinct from `RustBackendUnsupported`.
- New parity test `parity/scripts/test_channel_translate.py` —
  35 tests covering bridge contract, custom-class rejection,
  caller-misuse errors, and per-class round-trip semantics for all
  10 channels.
- Per-class encoding table documented inline (Rust module docstring)
  and traced back to handoff doc.
- Workspace dep: `rmp-serde = "1"` for serde-style msgpack on the
  Rust side. Python uses `ormsgpack` (already in bridge venv).

Wire format
-----------
msgpack bytes — matches the locked architectural decision in
`phase-1-followups.md` entry langchain-ai#3 §6. Python packs with `ormsgpack`,
Rust decodes via `rmp_serde` into `serde_json::Value`. Same encoding
family the rest of the project uses for checkpoint blobs; future
expansion to ext-coded values (LangChain messages, etc.) layers on
without changing the bridge surface.

Parity gate
-----------
For each channel class, drive the round-trip
  Python.checkpoint() -> msgpack -> Rust.from_checkpoint() ->
  Rust.checkpoint() -> msgpack -> Python.from_checkpoint() -> .get()
and assert the seeded Python channel observes the same state as the
original. For `DeltaChannel` snapshot blobs, the Rust side collapses
to sentinel (matching Python's invariant); we verify the replay
target instead.

What the gate caught
--------------------
- Python's `DeltaChannel.from_checkpoint(MISSING)` is asymmetric:
  it sets `value = typ()` rather than leaving the channel MISSING.
  Test `test_missing_round_trips_missing` documents the asymmetry
  with a `fresh.get() == {}` assertion.
- Subclasses of stdlib channels need a distinct error path from
  fully-custom channels: the runner ought to know "we know the
  parent shape but you customised it" vs "we have no idea what this
  is". `class_name` walks the supported-class MRO to give a precise
  message.
- Caller misuse (missing `operator` / `reducer` / `names` in
  `init_args`) is `ValueError`, not `RustBackendUnsupported`. Two
  failure modes wearing one exception turns 5-minute debugs into
  30-minute ones.

Test counts
-----------
- Cargo workspace: 211 passed (was 210; +1 invalid_msgpack test).
  Clippy clean.
- Phase 0: 73/73 round-trip, 49 allowlist, strict reject; 58
  conformance.
- Phase 1 + 1.2b foundation + #4c: 104 passed (was 69; +35 channel
  translate tests).
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…doff for langchain-ai#6/langchain-ai#7

Milestone update for the combined Step 1.2b + 1.3 final stretch.

Locked architectural decision (2026-05-06)
------------------------------------------
The original plan §6 and `phase-1-followups.md` entry langchain-ai#3 §5 left the door
open to "a tighter cut decided in implementation" for the
`langgraph_rs.backend` monkeypatch — i.e., replacing only `tick()`
(approach B) or only `_algo.apply_writes` + `prepare_next_tasks`
(approach C) instead of the full `SyncPregelLoop` (approach A).

The user has explicitly chosen approach A: full `SyncPregelLoop`
replacement. Reasoning captured in the new handoff doc:

  * A is the only approach where `LANGGRAPH_BACKEND=rust` actually
    means "Rust drives the loop" — B and C still leave Python
    orchestrating most per-tick work.
  * B's per-tick re-sync of channel state is wasteful and adds an
    extra parity surface that's correctness risk we don't need.
  * C is essentially a third copy of `test_pregel_differential.py`'s
    coverage — buys us nothing new.
  * "Done right the first time" — the full replacement is bigger but
    architecturally honest; a tighter cut is technical debt that
    would need to be redone before Step 1.4 streaming or Phase 2.

The "re-build from checkpoint each tick" guidance from the original
phase-1-followups langchain-ai#3 §6 is also superseded: under approach A, Rust
state is constructed once at `__enter__` (Python → Rust via
`_channel_translate.extract_state`) and applied once at `__exit__`
(Rust → Python via `apply_state`). No per-tick re-sync.

What this commit changes
------------------------
- New `rust/docs/STEP-1.2B-FINAL-HANDOFF.md`: the comprehensive
  handoff brief for the next session picking up langchain-ai#6 and langchain-ai#7. Covers:

  * Where we are (status table through `c03c7ac6`).
  * Locked architectural decision (approach A).
  * langchain-ai#6 sub-step breakdown (#6a Maturin layout switch → #6b
    backend.py monkeypatch → #6c Pregel runtime bridge entry point).
  * langchain-ai#7 iteration loop (87-test gate).
  * `__init__.py` re-export shim contents (drop-in for the layout
    switch).
  * Replaced symbols list pattern for `backend.py`.
  * `RustBackendUnsupported` rejection sites for the 4 deliberately
    out-of-scope feature families (custom channels, subgraphs, Send,
    interrupts, stream modes outside values/updates).
  * Verification block, hard rules, bridge install gotcha,
    lessons-learned forwarding from prior handoffs.

- `rust/docs/STEP-1.2B-PARTIAL-HANDOFF.md`: prepended a
  SUPERSEDED notice pointing at the new final handoff for langchain-ai#6/langchain-ai#7. The
  partial-handoff content is preserved as historical context for what
  shipped in #4c and langchain-ai#5.

- `rust/docs/phase-1-followups.md` entry langchain-ai#3 §5 + §6: amended to
  record the approach A decision and the supersession of the
  per-tick-resync line.

- `.omc/plans/langgraph-rust-port-2026-04-30.md` §6 Step 1.2b+1.3
  Locked decisions §4 + §5: same amendments, with a pointer to the
  final handoff doc.

What's not changing
-------------------
The 5 hard architectural decisions in §6 ("combined milestone",
"async runtime: pyo3-async-runtimes", "GIL discipline", "errors via
PregelExecutionError::NodeFailed", "channel translation by class
name") remain locked. Approach A is the runtime-shape decision that
sits *above* those.

Test counts
-----------
Unchanged — pure docs commit. Latest baseline (HEAD = `c03c7ac6`):
  * Cargo: 220 passed, clippy clean.
  * Phase 0: 73/73 + 49 + strict reject; 58 conformance.
  * Phase 1 + 1.2b foundation + #4c + langchain-ai#5: 129 passed.
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…hon scaffolding)

Land the Python-side activation surface for the Rust backend.
Sub-task #6b of the combined Step 1.2b + 1.3 milestone
(`rust/docs/STEP-1.2B-FINAL-HANDOFF.md`); Rust runtime bridge entry
point follows in #6c, full 87-test gate is langchain-ai#7.

Approach choice (locked in this commit body)
--------------------------------------------
The handoff's "approach A — full SyncPregelLoop replacement" left an
implementation question: stand up a parallel duck-typed shadow class,
or subclass `SyncPregelLoop` and override only the algorithmic core?
We chose **subclass + override** after surfacing the trade-off:

  * Subclass inherits `BackgroundExecutor` / `ExitStack` / lifecycle
    event queue / status machinery / checkpoint persistence /
    `accept_push` / `output_writes` / `match_cached_writes` from
    upstream. None of those is a parity surface we want to grow in
    Python, and reimplementing 30+ methods just to hit
    `Pregel.invoke`'s read sites is the wrong cost shape for V0.1.
  * Approach A's distinguishing property over B — "Rust state lives
    across the whole graph execution; no per-tick *bidirectional*
    re-sync" — is preserved by syncing once at `__enter__` (Python
    channels → Rust) and once at `__exit__` (Rust → Python channels).
    A subclass that overrides the algorithmic core (`__enter__`
    validation + Rust seed, `tick`/`after_tick` per-superstep BSP
    work, `__exit__` flush) gets approach A's behaviour with
    significantly less Python-side parity surface than a duck-typed
    shadow.

What landed
-----------
- `rust/ffi/langgraph-py/python/langgraph_rs/backend.py` — the
  activation module:
  * `REPLACED_SYMBOLS` tuple (top-of-file, auditable diff against
    upstream): both `langgraph.pregel._loop.{Sync,Async}PregelLoop`
    AND `langgraph.pregel.main.{Sync,Async}PregelLoop`. The
    second pair is load-bearing — `pregel/main.py` imports the loop
    classes directly (`from langgraph.pregel._loop import
    SyncPregelLoop, AsyncPregelLoop`), so the runtime call sites
    at `main.py:2847` (sync) and `main.py:3299` (async) read the
    local module attribute. A single-namespace patch leaves
    `Pregel.stream` / `astream` instantiating the upstream class.
    Caught by smoke-test langchain-ai#3 below.
  * `_RustSyncPregelLoop(SyncPregelLoop)` — subclass with overrides:
    - `__enter__` calls `super().__enter__()`, then iterates
      `self.channels` validating each via
      `_channel_translate.class_name` (raises
      `RustBackendUnsupported` for custom or subclassed channels);
      then rejects unsupported `interrupt_before` / `interrupt_after`
      / non-`{values,updates}` stream modes.
    - `tick()` raises `NotImplementedError` pointing at sub-step #6c
      until the Rust runtime bridge entry point lands. Auditably
      non-functional rather than silent fallback.
  * `_RustAsyncPregelLoop(AsyncPregelLoop)` — subclass that allows
    construction (so `Pregel.astream`'s `async with` setup doesn't
    crash before the rejection) but raises `RustBackendUnsupported`
    from `__aenter__`. Async parity is a deferred follow-up after
    the 87-test sync gate is green.
  * `_install_monkeypatches()` is idempotent and gated by
    `LANGGRAPH_BACKEND=rust` at import time. `is_active()` exposes
    the install state to tests.
- `rust/ffi/langgraph-py/python/langgraph_rs/_channel_translate.py` —
  moved (via `git mv`) from `parity/scripts/_channel_translate.py`.
  Production backend code shouldn't depend on parity-test
  infrastructure; the helper now lives in the package and the parity
  test imports it from there. (Functional contents unchanged.)
- `parity/scripts/test_channel_translate.py` — import switched from
  the `sys.path.insert(...) + from _channel_translate import ...`
  shim to `from langgraph_rs._channel_translate import ...`. 35 tests
  pass unchanged.
- `parity/scripts/test_backend_activation.py` — new, 10 smoke tests
  pinning #6b's surface (replacement list, both-namespace patch,
  subclass MRO, idempotency, every rejection site, `#6c` stub
  pointer, async-`__aenter__` rejection).
- `conftest.py` (project root) — top-level pytest hook that imports
  `langgraph_rs.backend` when `LANGGRAPH_BACKEND=rust` is set. Lives
  at the repo root so it covers both `parity/scripts/` and
  `libs/langgraph/tests/` (the 87-test gate's home for langchain-ai#7); does
  nothing without the env var.

Out of scope / explicitly rejected (`RustBackendUnsupported`)
-------------------------------------------------------------
- Custom user-defined channel classes (anything not in the 10 stdlib
  set surfaced by `langgraph_rs._channel_translate`).
- `interrupt_before` / `interrupt_after` (V0.1 deliberately excludes
  interrupts; the 87-test filter excludes them via `-k "not
  interrupt"` but a filter slip surfaces here).
- Stream modes outside `values` and `updates`.
- Async (`AsyncPregelLoop`) — symbol replaced symmetrically but
  rejects at `__aenter__`.

Out of scope / deferred to #6c
------------------------------
- Subgraphs / `Send` / nested-`Pregel` rejection. Those need
  per-node introspection at `__enter__` time; deferred to #6c
  alongside the actual Rust call so we don't grow validation that
  isn't yet exercised.
- The actual Rust call. `tick()` raises `NotImplementedError`
  pointing at #6c. Activating `LANGGRAPH_BACKEND=rust` and invoking
  any graph that passes the rejection sites will fail predictably.

Parity gate
-----------
- Without `LANGGRAPH_BACKEND=rust`: every existing parity gate
  unchanged in count and result. Importing the module is a no-op,
  upstream `SyncPregelLoop`/`AsyncPregelLoop` symbols untouched.
- With `LANGGRAPH_BACKEND=rust`: 10 new smoke tests in
  `test_backend_activation.py` pin both the replacement surface and
  the rejection paths. No graph actually runs end-to-end — that's
  #6c.

What the gate caught
--------------------
1. Single-namespace monkeypatch is insufficient. First wiring patched
   only `langgraph.pregel._loop.{Sync,Async}PregelLoop`; activating
   the env var and calling `graph.invoke(...)` did NOT raise the #6c
   `NotImplementedError` because `pregel/main.py` had imported the
   class directly into its module namespace at langgraph load time,
   and the `with SyncPregelLoop(...)` call site read the local
   reference. Fixed by extending `_install_monkeypatches` to patch
   `langgraph.pregel.main` as well, and pinning that in
   `REPLACED_SYMBOLS`.
2. `_channel_translate.py` was reachable from the parity tests via a
   `sys.path.insert(...)` shim, but the backend module needs it as a
   real package import. Moved into `langgraph_rs/` so production code
   doesn't depend on `parity/scripts/` layout.
3. The async test originally tried to introspect docstrings on the
   override; that was over-engineered and brittle. Replaced with a
   direct `asyncio.run(instance.__aenter__())` pytest.raises check.

Test counts
-----------
- Cargo workspace: 220 passed; clippy clean. No Rust changes.
- Phase 0: 73/73 corpus + 49 allowlist + strict reject;
  58 conformance pass / 0 fail.
- Phase 1 + 1.2b foundation + #4c + langchain-ai#5 + #6a: 129 passed (unchanged;
  no LANGGRAPH_BACKEND set).
- Phase 1 + 1.2b foundation + #4c + langchain-ai#5 + #6a + #6b: 139 passed
  (+10 backend activation tests).
Alaina Hardie (trianglegrrl) added a commit to trianglegrrl/langgraph that referenced this pull request May 6, 2026
…nly gate green

Closes the combined Step 1.2b + 1.3 milestone. The
``LANGGRAPH_BACKEND=rust`` filter on
``libs/langgraph/tests/test_pregel.py`` matches **81 tests** (the
handoff's "87" estimate was written before the test set drifted; the
``-k "memory and not streaming and not interrupt and not subgraph and
not send"`` filter is verbatim). All 81 pass on first run after
sub-step #6c landed — no triage iteration was needed.

What landed
-----------
- ``parity/scripts/run_87_test_gate.sh`` — runnable wrapper that sets
  ``NO_DOCKER=true`` (skips redis/postgres fixtures the bridge venv
  doesn't carry) and ``LANGGRAPH_BACKEND=rust``, points pytest at the
  filter, and forwards extra args. Single command for re-running
  the gate locally.
- ``rust/ffi/langgraph-py/pyproject.toml`` — added ``[gate-87]``
  dependency-group capturing the four collection-time deps the
  upstream conftest pulls in (``redis``, ``pytest-mock``, ``syrupy``,
  ``pycryptodome``). The bridge venv was missing these because the
  set is what ``libs/langgraph/.venv`` carries for its own test
  suite, not what the bridge needs for codec parity. Documenting in
  the dependency-group keeps the install command self-describing
  (``uv pip install --group gate-87``).
- ``rust/docs/phase-1-followups.md`` — entry langchain-ai#3 (async PyO3 bridge
  + ``LANGGRAPH_BACKEND=rust`` wiring) marked **closed**. Added the
  amendment note that the implementation chose subclass + override
  for ``_RustSyncPregelLoop`` (rather than the literal stand-alone
  duck-typed shadow class the prose example sketched), with the
  rationale matching the design discussion at the start of #6b.
  The async surface stays deferred — ``_RustAsyncPregelLoop``
  raises at ``__aenter__`` until a phase that needs streaming /
  ``astream`` parity owns it.

Bridge-venv setup deltas (one-time, since this commit)
------------------------------------------------------
- ``redis``, ``pytest-mock``, ``syrupy``, ``pycryptodome`` installed
  via the new ``gate-87`` dependency group.
- ``libs/checkpoint-sqlite`` and ``libs/checkpoint-postgres``
  installed in editable mode so the conftest can import
  ``langgraph.cache.sqlite``. (The other libs were already
  editable-installed by Phase 0.)

Parity gate (the milestone gate)
--------------------------------
::

    NO_DOCKER=true LANGGRAPH_BACKEND=rust \
        rust/ffi/langgraph-py/.venv/bin/python -m pytest \
        libs/langgraph/tests/test_pregel.py \
        -k "memory and not streaming and not interrupt and not subgraph and not send"

Result: ``81 passed, 376 deselected in 9.98s``. Sanity check
confirmed the Rust runtime is genuinely driving the loop (not a
silent fallback to upstream Python): instrumenting
``run_pregel_loop_topology`` with a call counter shows it's
invoked on every ``graph.invoke`` under ``LANGGRAPH_BACKEND=rust``,
both ``langgraph.pregel._loop`` and ``langgraph.pregel.main``
namespaces resolve ``SyncPregelLoop`` to ``_RustSyncPregelLoop``,
and ``backend.is_active()`` returns ``True``.

What the gate caught
--------------------
Nothing. The 81 tests passed on first run after the bridge wheel
was rebuilt with #6c and the bridge venv had its collection-time
deps installed. The handoff explicitly warned to "expect failures
to send you back to #4c / langchain-ai#5 / langchain-ai#6 for incremental fixes"; that
budget went unused. Plausible reasons:

1. The four channel-translation rejection sites
   (``CONFIG_KEY_READ`` shadow, panic-stub binop, custom-channel
   gate, async ``__aenter__``) cleanly cover the corners that
   would have been the most likely failure surfaces. The 87-test
   filter ``-k`` excludes the patterns those rejections would
   trip on (``streaming``, ``interrupt``, ``subgraph``, ``send``).
2. The local-state shadow ``CONFIG_KEY_READ`` reader the wrapper
   provides is enough for every conditional-edge test in the
   filter — none of them read channels the routing node didn't
   write.
3. The translation surface from sub-step #4c (1,200+ hypothesis
   iterations across the 10 stdlib channel classes) was already
   verified, so the per-class state encoding round-trips cleanly
   under load.

Test counts
-----------
- Cargo workspace: 220 passed; clippy clean.
- Phase 0: 73/73 corpus + 49 allowlist + strict reject;
  58 conformance pass / 0 fail.
- Phase 1 + 1.2b foundation + #4c + langchain-ai#5 + #6a + #6b + #6c
  (LANGGRAPH_BACKEND unset): 141 passed.
- **Combined Step 1.2b + 1.3 milestone gate (LANGGRAPH_BACKEND=rust):
  81 passed / 0 failed.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant