[parser] Bound iterative expression chains to avoid stack overflow#25462
[parser] Bound iterative expression chains to avoid stack overflow#25462samuelcolvin wants to merge 2 commits into
Conversation
Left-associative binary chains (`1+1+...`) and postfix chains (`f()()...`, `a[0][0]...`, `a.b.c...`) are parsed in iterative loops (precedence climbing / postfix dispatch) rather than via recursion, so they bypass the `with_recursion` depth guard. The lexer `nesting()` guard only catches *accumulating* brackets like `f(f(...))`; chained postfix keeps nesting balanced (and `.` has no bracket at all), so it slips through entirely. The result is a tree whose depth equals the operator/postfix count. It parses fine, but the derived drop glue recurses once per level, so dropping (or otherwise traversing) a sufficiently deep tree overflows the native stack -- a limit-independent DoS for consumers that parse untrusted source. Count each chain level against `depth_remaining` in both loops so the total tree depth stays within `max_recursion_depth`, yielding a clean `RecursionLimitExceeded` instead of an abort. Flat `and`/comparison chains are unaffected: they flatten into a single `BoolOp`/`Compare` node and are safe to drop at any length. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Memory usage reportMemory usage unchanged ✅ |
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
unresolved-attribute |
5 | 0 | 0 |
invalid-syntax |
4 | 0 | 0 |
| Total | 9 | 0 | 0 |
Raw diff:
sympy (https://github.com/sympy/sympy)
+ sympy/polys/numberfields/resolvent_lookup.py:225:55 error[invalid-syntax] Source is too deeply nested
+ sympy/polys/numberfields/resolvent_lookup.py:458:1 error[invalid-syntax] Expected `)`, found end of file
+ sympy/polys/polyquinticconst.py:80:5699 error[invalid-syntax] Source is too deeply nested
+ sympy/polys/polyquinticconst.py:189:1 error[invalid-syntax] Expected dedent, found end of file
+ sympy/polys/polyroots.py:579:34 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `zeta`
+ sympy/polys/polyroots.py:580:9 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `T`
+ sympy/polys/polyroots.py:590:10 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `l0`
+ sympy/polys/polyroots.py:598:13 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `order`
+ sympy/polys/polyroots.py:655:12 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `uv`|
This limit is certainly too low as is demonstrated by the ecosystem results and guessing the right limit here seems tricky |
| // unbounded depth. That tree parses fine but overflows the native stack | ||
| // when it is later traversed or dropped (the derived drop glue recurses | ||
| // once per level). Counting the spine against `depth_remaining` keeps | ||
| // the total tree depth within `max_recursion_depth`. |
There was a problem hiding this comment.
I think this clearly says that this is not a parser bug. What the right "max-depth" is depends on the concrete application code, specifically, by how much the stack frame grows between visiting two expression nodes. This is also what I pointed out in the previous PR.
The solution in user code are to:
a) Don't use a visitor and also use loop unrolling
b) Increase the stack size. This does not solve the problem but it mitigates it and works fine for ty (and I suspect pyrefly)
c) Use a stack growing library https://docs.rs/stacker/latest/stacker/
Given this. I don't think we should change anything here in the parser because there's no correct depth. I also don't think we should make this configurable. This should be handled by the AST user.
There was a problem hiding this comment.
Here's an example for how to use stacker #25464
I don't know what platforms monty support but it seems close to zero cost. https://crates.io/crates/psm
|
|
This change is also not CPython compatible (it turns out, the version on main now also rejects valid CPython programs).
The tricky part about the recursion limit is that CPython's limit is based on when it reaches Python's stack limit. That's not something we want or even can easily mimic. CPython also has another 6000 limit that we currently don't implement. That means, Ruff's parser now only accepts a subset of valid CPython programs (oops). However, that limit is also based on platform (wasm) and profile (debug/release). I think the most important goal for Ruff's parser is that it can parse any valid CPython program. But it seems impossible to guarantee this on any platform because different platforms use different stack sizes and different profiles have different stack frames. In the end, this really becomes a best effort and not something that the Ruff parser can or IMO, should guaranteed. |
I think that's a reasonable aim. But also not overflowing the stack (or creating rust structs that will overflow the stack when they're dropped) is imporant I think. |
The parser now caps iterative binary/postfix chains at `max_recursion_depth`, so `1 + 1 + ... + 1` (2000 operators) is reported as `Source is too deeply nested` rather than building a tree deep enough to overflow the stack. Update the two ty regression tests that exercised this shape to expect the syntax error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 91.94%. The percentage of expected errors that received a diagnostic held steady at 87.09%. The number of fully passing files held steady at 92/134. |
|
closed in favour of #25480 |
I'm not convinced you'll merge this, but without it #24810 isn't that useful - code can build extremely deep ast nodes which stackoverflow when the panic.
Happy to discuss a better approach, I wanted to get feedback before going further.
Summary
Left-associative binary chains (
1+1+...) and postfix chains (f()()...,a[0][0]...,a.b.c...) are parsed in iterative loops (precedence climbing / postfix dispatch) rather than via recursion, so they bypass thewith_recursiondepth guard. The lexernesting()guard only catches accumulating brackets likef(f(...)); chained postfix keeps nesting balanced (and.has no bracket at all), so it slips through entirely.The result is a tree whose depth equals the operator/postfix count. It parses fine, but the derived drop glue recurses once per level, so dropping (or otherwise traversing) a sufficiently deep tree overflows the native stack -- a limit-independent DoS for consumers that parse untrusted source.
Count each chain level against
depth_remainingin both loops so the total tree depth stays withinmax_recursion_depth, yielding a cleanRecursionLimitExceededinstead of an abort. Flatand/comparison chains are unaffected: they flatten into a singleBoolOp/Comparenode and are safe to drop at any length.Test Plan