Skip to content

Stop collecting links for non-working deduping - Reduce peak memory by 50% during long resolves#13843

Merged
notatallshaw merged 4 commits into
pypa:mainfrom
notatallshaw:stop-collecting-links
Apr 7, 2026
Merged

Stop collecting links for non-working deduping - Reduce peak memory by 50% during long resolves#13843
notatallshaw merged 4 commits into
pypa:mainfrom
notatallshaw:stop-collecting-links

Conversation

@notatallshaw

@notatallshaw notatallshaw commented Mar 7, 2026

Copy link
Copy Markdown
Member

Fixes #12834

_logged_links stored (Link, LinkType, str) tuples to deduplicate "Skipping link" debug messages. Because each Link hashes by URL, every entry was unique the dedupe never fired, the set just accumulated Link references, preventing GC of anything the Link object was referencing.

As this never worked I'm just removing it, keeping only a set[str] for Requires-Python skip reasons (the only data read back from the set).

Using the following large resolve as a test:

pip install --dry-run apache-airflow[amazon,celery,cncf-kubernetes,docker,elasticsearch,google,mysql,postgres,redis,slack,snowflake,ssh]==3.0.6 --uploaded-prior-to 2026-01-01T00:00:00Z

~120k Link objects were not stored in the set, and peak memory went down from ~350 MiB to ~180 MiB.

@notatallshaw notatallshaw added this to the 26.1 milestone Mar 7, 2026
@ichard26 ichard26 self-requested a review March 14, 2026 04:06

@ichard26 ichard26 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

@notatallshaw notatallshaw merged commit d03b30b into pypa:main Apr 7, 2026
28 checks passed
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Apr 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pip memory usage for large cached install dominated by list of candiate pages

2 participants