Skip to content

Shrink additional parser AST collections#25465

Merged
MichaReiser merged 4 commits into
mainfrom
micha/parser-shrink-more-collections
Jun 4, 2026
Merged

Shrink additional parser AST collections#25465
MichaReiser merged 4 commits into
mainfrom
micha/parser-shrink-more-collections

Conversation

@MichaReiser

@MichaReiser MichaReiser commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

Shrink the containers for more AST node fields during parser to reduce AST memory consumption.

This PR adds more shrink_to_fit calls to the parser. It's still not all nodes, I focused on nodes for which we see the biggest memory improvements.

AST node field Retained slack removed
Parameters.args 49.68 MiB
StmtImportFrom.names 44.15 MiB
ExprTuple.elts 25.39 MiB
StmtFunctionDef.decorator_list 22.51 MiB
ExprDict.items 22.42 MiB
ExprList.elts 13.15 MiB
StmtWith.items 11.96 MiB
StmtImport.names 8.65 MiB
StmtIf.elif_else_clauses 5.66 MiB
ExprListComp.generators 5.26 MiB
ExprBoolOp.values 4.29 MiB
ExprGenerator.generators 3.37 MiB
StmtTry.handlers 2.09 MiB
Parameters.kwonlyargs 1.76 MiB
Parameters.posonlyargs 1.69 MiB
StmtClassDef.decorator_list 1.60 MiB
ExprDictComp.generators 1.41 MiB
Comprehension.ifs 1.12 MiB
ExprSetComp.generators 753.1 KiB
Total 226.92 MiB

I focused on nodes where the saving was at least one MiB, with the exception of ExprSetComp which shares its parsing with ExprListComp.

These numbers are from parsing all Python files in pytest, typeshed, home-assistant-core, anyio, mypy, fastapi, pandas, flake8, trio, sphinx, and prefect.

Performance

This PR introduces a small performance regression because the shrink_to_fit calls often require reallocating the collection nodes. The ty project walltime performance benchmarks range and the formatter and linter benchmarks from neutral to -1%. The parser benchmarks range from -1 to -4%, which is roughly what I'd expect

@astral-sh-bot

astral-sh-bot Bot commented May 29, 2026

Copy link
Copy Markdown

Memory usage report

Summary

Project Old New Diff Outcome
flake8 41.41MB 39.49MB -4.63% (1.92MB) ⬇️
trio 103.82MB 100.95MB -2.76% (2.87MB) ⬇️
sphinx 250.00MB 246.54MB -1.39% (3.47MB) ⬇️
prefect 674.31MB 670.65MB -0.54% (3.65MB) ⬇️

Significant changes

Click to expand detailed breakdown

flake8

Name Old New Diff Outcome
parsed_module 13.71MB 11.79MB -13.97% (1.92MB) ⬇️

trio

Name Old New Diff Outcome
parsed_module 21.02MB 18.15MB -13.65% (2.87MB) ⬇️

sphinx

Name Old New Diff Outcome
parsed_module 25.63MB 22.16MB -13.52% (3.47MB) ⬇️

prefect

Name Old New Diff Outcome
parsed_module 26.90MB 23.25MB -13.57% (3.65MB) ⬇️

@astral-sh-bot

astral-sh-bot Bot commented May 29, 2026

Copy link
Copy Markdown

ecosystem-analyzer results

No diagnostic changes detected ✅

Full report with detailed diff (timing results)

@astral-sh-bot

astral-sh-bot Bot commented May 29, 2026

Copy link
Copy Markdown

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@codspeed-hq

codspeed-hq Bot commented May 29, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 6.52%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 29 improved benchmarks
✅ 96 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory parser[pydantic/types.py] 368.5 KB 322.7 KB +14.2%
Memory parser[unicode/pypinyin.py] 52.6 KB 46.5 KB +13.15%
Memory parser[numpy/globals.py] 16 KB 14.3 KB +12.44%
Memory parser[numpy/ctypeslib.py] 166.5 KB 150.8 KB +10.38%
Memory ty_micro[pandas_tdd] 39.4 MB 36.4 MB +8.06%
Memory ty_micro[very_large_tuple] 13.3 MB 12.5 MB +6.65%
Memory ty_micro[gradual_vararg_call] 13.1 MB 12.3 MB +6.63%
Memory ty_micro[vararg_parameter_type_accumulation] 12 MB 11.3 MB +6.34%
Memory ty_micro[complex_constrained_attributes_2] 13.5 MB 12.7 MB +6.09%
Memory ty_micro[complex_constrained_attributes_1] 13.5 MB 12.7 MB +6.08%
Memory ty_micro[many_tuple_assignments] 14 MB 13.2 MB +5.87%
Memory ty_micro[literal_match_fallthrough] 12.4 MB 11.7 MB +5.81%
Memory parser[large/dataset.py] 816.8 KB 772.2 KB +5.78%
Memory ty_micro[many_string_assignments] 14.6 MB 13.8 MB +5.73%
Memory DateType 22.9 MB 21.6 MB +5.7%
Memory ty_micro[many_enum_members_2] 14.6 MB 13.8 MB +5.67%
Memory ty_micro[large_isinstance_narrowing] 14.5 MB 13.7 MB +5.65%
Memory ty_micro[complex_constrained_attributes_3] 15 MB 14.2 MB +5.57%
Memory ty_micro[many_tuple_assignments] 12.9 MB 12.2 MB +5.56%
Memory ty_micro[typevar_mapping_small_accumulations] 14.8 MB 14 MB +5.54%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Use the CodSpeed MCP and ask your agent.


Comparing micha/parser-shrink-more-collections (d8914fe) with main (06dd02b)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (4828dec) during the generation of this report, so 06dd02b was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Comment thread crates/ruff_python_parser/src/parser/expression.rs Outdated
@MichaReiser MichaReiser added the parser Related to the parser label Jun 2, 2026
@MichaReiser MichaReiser marked this pull request as ready for review June 2, 2026 06:34
@MichaReiser MichaReiser requested a review from dhruvmanila as a code owner June 2, 2026 06:34
@dhruvmanila

Copy link
Copy Markdown
Member

These numbers are from parsing all Python files in pytest, typeshed, home-assistant-core, anyio, mypy, fastapi, pandas, flake8, trio, sphinx, and prefect.

Are the numbers total reduction across all these repositories or an average reduction ?

@dhruvmanila dhruvmanila left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, thanks!

@MichaReiser

Copy link
Copy Markdown
Member Author

Are the numbers total reduction across all these repositories or an average reduction ?

It's the reduction across all repositories.

@MichaReiser MichaReiser merged commit 10ccd51 into main Jun 4, 2026
58 checks passed
@MichaReiser MichaReiser deleted the micha/parser-shrink-more-collections branch June 4, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

memory parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants