[export] avoid RecursionError in guards-fn codegen for deeply nested guards (#186993)#186993
Closed
kqfu wants to merge 1 commit into
Closed
[export] avoid RecursionError in guards-fn codegen for deeply nested guards (#186993)#186993kqfu wants to merge 1 commit into
kqfu wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/186993
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 6 PendingAs of commit 029b577 with merge base 1979118 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@kqfu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108111211. |
kqfu
added a commit
to kqfu/pytorch
that referenced
this pull request
Jun 11, 2026
…guards (pytorch#186993) Summary: `ExportedProgram.module()` builds a `_guards_fn` submodule that re-asserts the exported shape guards. For each assert's human-readable error message, `_convert_guards_code_to_fn` (in `torch/export/_unlift.py`) pretty-prints the guard via `ast.unparse(ast.parse(shadow))`. Both `ast.parse` and `ast.unparse` recurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model with `auto_dynamic_shapes` over hundreds of jagged/KJT features -- exceeds Python's recursion limit and raises `RecursionError`, aborting the entire export (including standalone publish, which reaches this code via `run_decompositions()` -> `module()`). Root cause: the `ast.unparse(ast.parse(...))` round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separate `actual` expression and does not depend on the pretty-printed `shadow`, so a deep guard should never be fatal. Fix: wrap the normalization in `try/except RecursionError` and fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case. Test Plan: Built custom aps package and publish f1096406197 Added `test_guards_fn_recovers_from_unparse_recursion_error`, which mocks `ast.unparse` to raise `RecursionError` and asserts `_convert_guards_code_to_fn` still returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deep `ast.parse`/`compile` recursion can abort the process before the pure-Python `RecursionError` is reached. ``` buck2 test fbcode//caffe2/test:test_export -- --regex 'test_guards_fn_recovers_from_unparse_recursion_error' ``` After the fix: `Pass 11. Fail 0. Fatal 0.` (the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails with `RecursionError: maximum recursion depth exceeded` at `_unlift.py` (`Pass 0, Fail 11`). Authored with the assistance of an AI coding assistant. Reviewed By: jijunyan, sophielin508 Differential Revision: D108111211
…guards (pytorch#186993) Summary: `ExportedProgram.module()` builds a `_guards_fn` submodule that re-asserts the exported shape guards. For each assert's human-readable error message, `_convert_guards_code_to_fn` (in `torch/export/_unlift.py`) pretty-prints the guard via `ast.unparse(ast.parse(shadow))`. Both `ast.parse` and `ast.unparse` recurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model with `auto_dynamic_shapes` over hundreds of jagged/KJT features -- exceeds Python's recursion limit and raises `RecursionError`, aborting the entire export (including standalone publish, which reaches this code via `run_decompositions()` -> `module()`). Root cause: the `ast.unparse(ast.parse(...))` round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separate `actual` expression and does not depend on the pretty-printed `shadow`, so a deep guard should never be fatal. Fix: wrap the normalization in `try/except RecursionError` and fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case. Test Plan: Built custom aps package and publish f1096406197 Added `test_guards_fn_recovers_from_unparse_recursion_error`, which mocks `ast.unparse` to raise `RecursionError` and asserts `_convert_guards_code_to_fn` still returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deep `ast.parse`/`compile` recursion can abort the process before the pure-Python `RecursionError` is reached. ``` buck2 test fbcode//caffe2/test:test_export -- --regex 'test_guards_fn_recovers_from_unparse_recursion_error' ``` After the fix: `Pass 11. Fail 0. Fatal 0.` (the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails with `RecursionError: maximum recursion depth exceeded` at `_unlift.py` (`Pass 0, Fail 11`). Authored with the assistance of an AI coding assistant. Reviewed By: jijunyan, sophielin508 Differential Revision: D108111211
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Collaborator
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
ExportedProgram.module()builds a_guards_fnsubmodule that re-asserts the exported shape guards. For each assert's human-readable error message,_convert_guards_code_to_fn(intorch/export/_unlift.py) pretty-prints the guard viaast.unparse(ast.parse(shadow)). Bothast.parseandast.unparserecurse once per AST node, so a guard whose expression is very deeply nested -- e.g. a sum over many symbolic sizes, as produced when exporting a recommendation model withauto_dynamic_shapesover hundreds of jagged/KJT features -- exceeds Python's recursion limit and raisesRecursionError, aborting the entire export (including standalone publish, which reaches this code viarun_decompositions()->module()).Root cause: the
ast.unparse(ast.parse(...))round-trip is purely cosmetic; as the existing comment states, it "is not necessary for correctness, just deemed desirable" -- it only normalizes redundant parentheses in the assert error string. The executed runtime check uses the separateactualexpression and does not depend on the pretty-printedshadow, so a deep guard should never be fatal.Fix: wrap the normalization in
try/except RecursionErrorand fall back to the un-normalized guard string. The emitted runtime assert is unchanged; only the readability of the guard-failure message degrades slightly in the rare deep-guard case.Test Plan:
Built custom aps package and publish f1096406197
Added
test_guards_fn_recovers_from_unparse_recursion_error, which mocksast.unparseto raiseRecursionErrorand asserts_convert_guards_code_to_fnstill returns a guards fn instead of propagating the error. A mock is used rather than a genuinely deep expression because the test target is ASAN-instrumented, where deepast.parse/compilerecursion can abort the process before the pure-PythonRecursionErroris reached.After the fix:
Pass 11. Fail 0. Fatal 0.(the test is fanned out across export modes: strict, nonstrict, serdes, retraceability, cpp_serdes, training_ir, nativert, ...). Before the fix the same test fails withRecursionError: maximum recursion depth exceededat_unlift.py(Pass 0, Fail 11).Authored with the assistance of an AI coding assistant.
Reviewed By: jijunyan, sophielin508
Differential Revision: D108111211