Skip to content

[parser] Bound iterative expression chains to avoid stack overflow#25462

Closed
samuelcolvin wants to merge 2 commits into
astral-sh:mainfrom
samuelcolvin:fix-deep-expression-stack-overflow
Closed

[parser] Bound iterative expression chains to avoid stack overflow#25462
samuelcolvin wants to merge 2 commits into
astral-sh:mainfrom
samuelcolvin:fix-deep-expression-stack-overflow

Conversation

@samuelcolvin

Copy link
Copy Markdown
Contributor

I'm not convinced you'll merge this, but without it #24810 isn't that useful - code can build extremely deep ast nodes which stackoverflow when the panic.

Happy to discuss a better approach, I wanted to get feedback before going further.


Summary

Left-associative binary chains (1+1+...) and postfix chains (f()()..., a[0][0]..., a.b.c...) are parsed in iterative loops (precedence climbing / postfix dispatch) rather than via recursion, so they bypass the with_recursion depth guard. The lexer nesting() guard only catches accumulating brackets like f(f(...)); chained postfix keeps nesting balanced (and . has no bracket at all), so it slips through entirely.

The result is a tree whose depth equals the operator/postfix count. It parses fine, but the derived drop glue recurses once per level, so dropping (or otherwise traversing) a sufficiently deep tree overflows the native stack -- a limit-independent DoS for consumers that parse untrusted source.

Count each chain level against depth_remaining in both loops so the total tree depth stays within max_recursion_depth, yielding a clean RecursionLimitExceeded instead of an abort. Flat and/comparison chains are unaffected: they flatten into a single BoolOp/Compare node and are safe to drop at any length.

Test Plan

Left-associative binary chains (`1+1+...`) and postfix chains
(`f()()...`, `a[0][0]...`, `a.b.c...`) are parsed in iterative loops
(precedence climbing / postfix dispatch) rather than via recursion, so
they bypass the `with_recursion` depth guard. The lexer `nesting()`
guard only catches *accumulating* brackets like `f(f(...))`; chained
postfix keeps nesting balanced (and `.` has no bracket at all), so it
slips through entirely.

The result is a tree whose depth equals the operator/postfix count. It
parses fine, but the derived drop glue recurses once per level, so
dropping (or otherwise traversing) a sufficiently deep tree overflows
the native stack -- a limit-independent DoS for consumers that parse
untrusted source.

Count each chain level against `depth_remaining` in both loops so the
total tree depth stays within `max_recursion_depth`, yielding a clean
`RecursionLimitExceeded` instead of an abort. Flat `and`/comparison
chains are unaffected: they flatten into a single `BoolOp`/`Compare`
node and are safe to drop at any length.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@astral-sh-bot

astral-sh-bot Bot commented May 29, 2026

Copy link
Copy Markdown

Memory usage report

Memory usage unchanged ✅

@astral-sh-bot

astral-sh-bot Bot commented May 29, 2026

Copy link
Copy Markdown

ecosystem-analyzer results

Lint rule Added Removed Changed
unresolved-attribute 5 0 0
invalid-syntax 4 0 0
Total 9 0 0

Raw diff:

sympy (https://github.com/sympy/sympy)
+ sympy/polys/numberfields/resolvent_lookup.py:225:55 error[invalid-syntax] Source is too deeply nested
+ sympy/polys/numberfields/resolvent_lookup.py:458:1 error[invalid-syntax] Expected `)`, found end of file
+ sympy/polys/polyquinticconst.py:80:5699 error[invalid-syntax] Source is too deeply nested
+ sympy/polys/polyquinticconst.py:189:1 error[invalid-syntax] Expected dedent, found end of file
+ sympy/polys/polyroots.py:579:34 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `zeta`
+ sympy/polys/polyroots.py:580:9 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `T`
+ sympy/polys/polyroots.py:590:10 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `l0`
+ sympy/polys/polyroots.py:598:13 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `order`
+ sympy/polys/polyroots.py:655:12 error[unresolved-attribute] Object of type `PolyQuintic` has no attribute `uv`

Full report with detailed diff (timing results)

@MichaReiser

Copy link
Copy Markdown
Member

This limit is certainly too low as is demonstrated by the ecosystem results and guessing the right limit here seems tricky

Comment on lines +268 to +271
// unbounded depth. That tree parses fine but overflows the native stack
// when it is later traversed or dropped (the derived drop glue recurses
// once per level). Counting the spine against `depth_remaining` keeps
// the total tree depth within `max_recursion_depth`.

@MichaReiser MichaReiser May 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this clearly says that this is not a parser bug. What the right "max-depth" is depends on the concrete application code, specifically, by how much the stack frame grows between visiting two expression nodes. This is also what I pointed out in the previous PR.

The solution in user code are to:

a) Don't use a visitor and also use loop unrolling
b) Increase the stack size. This does not solve the problem but it mitigates it and works fine for ty (and I suspect pyrefly)
c) Use a stack growing library https://docs.rs/stacker/latest/stacker/

Given this. I don't think we should change anything here in the parser because there's no correct depth. I also don't think we should make this configurable. This should be handled by the AST user.

@MichaReiser MichaReiser May 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example for how to use stacker #25464

I don't know what platforms monty support but it seems close to zero cost. https://crates.io/crates/psm

@astral-sh-bot

astral-sh-bot Bot commented May 29, 2026

Copy link
Copy Markdown

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser

MichaReiser commented May 29, 2026

Copy link
Copy Markdown
Member

This change is also not CPython compatible (it turns out, the version on main now also rejects valid CPython programs).

For a binary 1 + 1 + ... chain, ast.parse succeeded through 9,000 additions and raised RecursionError at 10,000 in this runtime. That is further evidence that CPython allows these through parsing and rejects them in downstream recursive handling, rather than applying a low parser nesting limit.

The tricky part about the recursion limit is that CPython's limit is based on when it reaches Python's stack limit. That's not something we want or even can easily mimic.

CPython also has another 6000 limit that we currently don't implement. That means, Ruff's parser now only accepts a subset of valid CPython programs (oops). However, that limit is also based on platform (wasm) and profile (debug/release).

I think the most important goal for Ruff's parser is that it can parse any valid CPython program. But it seems impossible to guarantee this on any platform because different platforms use different stack sizes and different profiles have different stack frames. In the end, this really becomes a best effort and not something that the Ruff parser can or IMO, should guaranteed.

@samuelcolvin

samuelcolvin commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

I think the most important goal for Ruff's parser is that it can parse any valid CPython program.

I think that's a reasonable aim. But also not overflowing the stack (or creating rust structs that will overflow the stack when they're dropped) is imporant I think.

➤ uvx python -c "open('stack_overflow.py', 'w').write('1' + '+1' * 15000)" && uvx ty check stack_overflow.py
Installed 1 package in 4ms

thread '<unknown>' (4565661) has overflowed its stack
fatal runtime error: stack overflow, aborting

The parser now caps iterative binary/postfix chains at
`max_recursion_depth`, so `1 + 1 + ... + 1` (2000 operators) is reported
as `Source is too deeply nested` rather than building a tree deep enough
to overflow the stack. Update the two ty regression tests that exercised
this shape to expect the syntax error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@astral-sh-bot

astral-sh-bot Bot commented May 30, 2026

Copy link
Copy Markdown

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 91.94%. The percentage of expected errors that received a diagnostic held steady at 87.09%. The number of fully passing files held steady at 92/134.

@samuelcolvin

Copy link
Copy Markdown
Contributor Author

closed in favour of #25480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants