Tags · pytorch/pytorch

trunk/083e2617b8ee20b66def9e05dfcee5be84af623a

[export] avoid RecursionError in guards-fn codegen for deeply nested …

…guards (#186993) (#186993)

Summary:

`ExportedProgram.module()` builds a `_guards_fn` submodule that re-asserts the exported shape guards. For each assert's human-readable error message, `_convert_guards_code_to_fn` (in `torch/export/_unlift.py`) pretty-prints the guard via `ast.unparse(ast.parse(shadow))`. Both `ast.parse` and `ast.unparse` recurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model with `auto_dynamic_shapes` over hundreds of jagged/KJT features -- exceeds Python's recursion limit and raises `RecursionError`, aborting the entire export (including standalone publish, which reaches this code via `run_decompositions()` -> `module()`).

Root cause: the `ast.unparse(ast.parse(...))` round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separate `actual` expression and does not depend on the pretty-printed `shadow`, so a deep guard should never be fatal.

Fix: wrap the normalization in `try/except RecursionError` and fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case.

Test Plan:
Built custom aps package and publish f1096406197

Added `test_guards_fn_recovers_from_unparse_recursion_error`, which mocks `ast.unparse` to raise `RecursionError` and asserts `_convert_guards_code_to_fn` still returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deep `ast.parse`/`compile` recursion can abort the process before the pure-Python `RecursionError` is reached.

```
buck2 test fbcode//caffe2/test:test_export -- --regex 'test_guards_fn_recovers_from_unparse_recursion_error'
```

After the fix: `Pass 11. Fail 0. Fatal 0.` (the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails with `RecursionError: maximum recursion depth exceeded` at `_unlift.py` (`Pass 0, Fail 11`).

Authored with the assistance of an AI coding assistant.

Reviewed By: jijunyan, sophielin508

Differential Revision: D108111211

Pull Request resolved: #186993
Approved by: https://github.com/jijunyan

Jun 12, 2026
083e261
zip
tar.gz

trunk/5ffde693e13e101c8a4f5ea685dfbaef0c7e7466

[c10] Make basic_string_view inherit from std::basic_string_view (#18…

…4152)

This PR simplifies c10:: basic_string_view body and keeps minimal methods.
Pull Request resolved: #184152
Approved by: https://github.com/Skylion007

Jun 12, 2026
5ffde69
zip
tar.gz

trunk/a4097e577fe5d1e21dfe2fa8c36af3fdf8854e34

Revert "[BE] Make spmd_type a CI rather than CD dependency (#187067)"

This reverts commit d4c98cd.

Reverted #187067 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](#187067 (comment)))

Jun 12, 2026
a4097e5
zip
tar.gz

ciflow/xpu/180646

Update sdp_utils.cpp

Jun 12, 2026
53c39a7
zip
tar.gz

ciflow/vllm/187114

update vllm commit hash

Jun 12, 2026
d0beccd
zip
tar.gz

ciflow/trunk/187114

update vllm commit hash

Jun 12, 2026
d0beccd
zip
tar.gz

ciflow/trunk/187110

Fix graph capture handling for ROCm 7.14+

Jun 12, 2026
39d0a33
zip
tar.gz

ciflow/trunk/187083

Move backend-specific c10d files into per-backend subfolders (#187083)

Summary:
Pull Request resolved: #187083

Reorganizes `torch/csrc/distributed/c10d` by moving non-public, backend-specific implementation files and the TCPStore backend files into per-backend subfolders, while leaving the public-facing classes at the top level (the `ProcessGroupGloo`/`NCCL`/`MPI`/`UCC` backends and the `Store`/`TCPStore`/`FileStore`/`HashStore`/`PrefixStore` classes all stay put).

The moves are: `store/` gets `TCPStoreBackend.{cpp,hpp}` and `TCPStoreLibUvBackend.cpp`; `gloo/` gets `ProcessGroupGlooCuda.cpp`, `ProcessGroupGlooDetail.hpp`, and `GlooDeviceFactory.{cpp,hpp}`; `ucc/` gets `UCCTracing.{cpp,hpp}` and `UCCUtils.{cpp,hpp}`; `nccl/` gets `NCCLXStub.hpp`.

`NCCLUtils.{cpp,hpp}` was deliberately kept at the top level even though it is backend-specific: it is included by several call sites outside `caffe2` (in `gen_ai`, `ads_mkl`, and `fbgemm_gpu`), so relocating it would be a wider, riskier change better done on its own. As a result the new `nccl/` folder currently holds only `NCCLXStub.hpp`.

All include sites were updated, covering both the canonical `torch/csrc/distributed/c10d/...` include form and the legacy short `c10d/...` form (used by `fb/GlooDeviceFactory.cpp`). Build wiring was updated in `build_variables.bzl` -- the canonical source list consumed by CMake (via `append_filelist` in `cmake/Codegen.cmake`), OSS Bazel, and OSS Buck -- and in the internal `fb/fbcode/target_definitions.bzl` for `ProcessGroupGlooCuda.cpp`. Headers are picked up by recursive globs, so no header-list edits were needed.

This is a pure file move: contents are unchanged apart from the relocated `#include` paths, so correctness is established by a clean build rather than by behavioral tests.

Authored with the assistance of an AI coding assistant (Claude Code).

Test Plan:
Confirmed no references to the old paths remain anywhere in `fbcode`, then ran the fbcode lint and build tooling:

```
arc f
arc lint
arc lint --take AUTODEPS --apply-patches
buck2 build fbcode//caffe2:_libtorch fbcode//caffe2:_libtorch_cuda
```

`arc f` and `arc lint` reported no issues; AUTODEPS produced no dependency changes (the moves stayed within existing Buck targets); both the CPU (`_libtorch`) and CUDA (`_libtorch_cuda`) libraries built successfully (exit 0).

Reviewed By: kapilsh

Differential Revision: D108332288

Jun 12, 2026
dd9f5b7
zip
tar.gz

ciflow/trunk/187067

Update

[ghstack-poisoned]

Jun 12, 2026
68013e4
zip
tar.gz

ciflow/trunk/186754

Update on "[dtensor] migrating tensor ops to single dim strategies"

**Summary:**

Before
Directly registered:
  rule (register_prop_rule):               2
  op_strategy (register_op_strategy):    158
  single_dim_strategy:                  1013
  total:                                1164

After
Directly registered:
  rule (register_prop_rule):               2
  op_strategy (register_op_strategy):    114
  single_dim_strategy:                  1068
  total:                                1176

Net New Ops Added: 12

**Test Cases**
1. pytest test/distributed/tensor/test_tensor_ops.py





[ghstack-poisoned]

Jun 12, 2026
52a8312
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trunk/083e2617b8ee20b66def9e05dfcee5be84af623a

trunk/5ffde693e13e101c8a4f5ea685dfbaef0c7e7466

trunk/a4097e577fe5d1e21dfe2fa8c36af3fdf8854e34

ciflow/xpu/180646

ciflow/vllm/187114

ciflow/trunk/187114

ciflow/trunk/187110

ciflow/trunk/187083

ciflow/trunk/187067

ciflow/trunk/186754

Tags: pytorch/pytorch