Skip to content

[export] avoid RecursionError in guards-fn codegen for deeply nested guards (#186993)#186993

Closed
kqfu wants to merge 1 commit into
pytorch:mainfrom
kqfu:export-D108111211
Closed

[export] avoid RecursionError in guards-fn codegen for deeply nested guards (#186993)#186993
kqfu wants to merge 1 commit into
pytorch:mainfrom
kqfu:export-D108111211

Conversation

@kqfu

@kqfu kqfu commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary:

ExportedProgram.module() builds a _guards_fn submodule that re-asserts the exported shape guards. For each assert's human-readable error message, _convert_guards_code_to_fn (in torch/export/_unlift.py) pretty-prints the guard via ast.unparse(ast.parse(shadow)). Both ast.parse and ast.unparse recurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model with auto_dynamic_shapes over hundreds of jagged/KJT features -- exceeds Python's recursion limit and raises RecursionError, aborting the entire export (including standalone publish, which reaches this code via run_decompositions() -> module()).

Root cause: the ast.unparse(ast.parse(...)) round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separate actual expression and does not depend on the pretty-printed shadow, so a deep guard should never be fatal.

Fix: wrap the normalization in try/except RecursionError and fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case.

Test Plan:
Built custom aps package and publish f1096406197

Added test_guards_fn_recovers_from_unparse_recursion_error, which mocks ast.unparse to raise RecursionError and asserts _convert_guards_code_to_fn still returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deep ast.parse/compile recursion can abort the process before the pure-Python RecursionError is reached.

buck2 test fbcode//caffe2/test:test_export -- --regex 'test_guards_fn_recovers_from_unparse_recursion_error'

After the fix: Pass 11. Fail 0. Fatal 0. (the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails with RecursionError: maximum recursion depth exceeded at _unlift.py (Pass 0, Fail 11).

Authored with the assistance of an AI coding assistant.

Reviewed By: jijunyan, sophielin508

Differential Revision: D108111211

@pytorch-bot

pytorch-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/186993

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 6 Pending

As of commit 029b577 with merge base 1979118 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync

meta-codesync Bot commented Jun 10, 2026

Copy link
Copy Markdown

@kqfu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108111211.

@meta-codesync meta-codesync Bot changed the title [export] avoid RecursionError in guards-fn codegen for deeply nested guards [export] avoid RecursionError in guards-fn codegen for deeply nested guards (#186993) Jun 11, 2026
@kqfu kqfu force-pushed the export-D108111211 branch from 65088e4 to 33c24f4 Compare June 11, 2026 16:58
kqfu added a commit to kqfu/pytorch that referenced this pull request Jun 11, 2026
…guards (pytorch#186993)

Summary:

`ExportedProgram.module()` builds a `_guards_fn` submodule that re-asserts the exported shape guards. For each assert's human-readable error message, `_convert_guards_code_to_fn` (in `torch/export/_unlift.py`) pretty-prints the guard via `ast.unparse(ast.parse(shadow))`. Both `ast.parse` and `ast.unparse` recurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model with `auto_dynamic_shapes` over hundreds of jagged/KJT features -- exceeds Python's recursion limit and raises `RecursionError`, aborting the entire export (including standalone publish, which reaches this code via `run_decompositions()` -> `module()`).

Root cause: the `ast.unparse(ast.parse(...))` round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separate `actual` expression and does not depend on the pretty-printed `shadow`, so a deep guard should never be fatal.

Fix: wrap the normalization in `try/except RecursionError` and fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case.

Test Plan:
Built custom aps package and publish f1096406197

Added `test_guards_fn_recovers_from_unparse_recursion_error`, which mocks `ast.unparse` to raise `RecursionError` and asserts `_convert_guards_code_to_fn` still returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deep `ast.parse`/`compile` recursion can abort the process before the pure-Python `RecursionError` is reached.

```
buck2 test fbcode//caffe2/test:test_export -- --regex 'test_guards_fn_recovers_from_unparse_recursion_error'
```

After the fix: `Pass 11. Fail 0. Fatal 0.` (the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails with `RecursionError: maximum recursion depth exceeded` at `_unlift.py` (`Pass 0, Fail 11`).

Authored with the assistance of an AI coding assistant.

Reviewed By: jijunyan, sophielin508

Differential Revision: D108111211
…guards (pytorch#186993)

Summary:

`ExportedProgram.module()` builds a `_guards_fn` submodule that re-asserts the exported shape guards. For each assert's human-readable error message, `_convert_guards_code_to_fn` (in `torch/export/_unlift.py`) pretty-prints the guard via `ast.unparse(ast.parse(shadow))`. Both `ast.parse` and `ast.unparse` recurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model with `auto_dynamic_shapes` over hundreds of jagged/KJT features -- exceeds Python's recursion limit and raises `RecursionError`, aborting the entire export (including standalone publish, which reaches this code via `run_decompositions()` -> `module()`).

Root cause: the `ast.unparse(ast.parse(...))` round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separate `actual` expression and does not depend on the pretty-printed `shadow`, so a deep guard should never be fatal.

Fix: wrap the normalization in `try/except RecursionError` and fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case.

Test Plan:
Built custom aps package and publish f1096406197

Added `test_guards_fn_recovers_from_unparse_recursion_error`, which mocks `ast.unparse` to raise `RecursionError` and asserts `_convert_guards_code_to_fn` still returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deep `ast.parse`/`compile` recursion can abort the process before the pure-Python `RecursionError` is reached.

```
buck2 test fbcode//caffe2/test:test_export -- --regex 'test_guards_fn_recovers_from_unparse_recursion_error'
```

After the fix: `Pass 11. Fail 0. Fatal 0.` (the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails with `RecursionError: maximum recursion depth exceeded` at `_unlift.py` (`Pass 0, Fail 11`).

Authored with the assistance of an AI coding assistant.

Reviewed By: jijunyan, sophielin508

Differential Revision: D108111211
@kqfu kqfu force-pushed the export-D108111211 branch from 33c24f4 to 029b577 Compare June 11, 2026 16:59

@jijunyan jijunyan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 11, 2026
@meta-codesync

meta-codesync Bot commented Jun 12, 2026

Copy link
Copy Markdown

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot

Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants