Skip to content

feat: cache url dependencies during dependency resolution#7595

Closed
ralbertazzi wants to merge 1 commit intopython-poetry:masterfrom
ralbertazzi:patch-1
Closed

feat: cache url dependencies during dependency resolution#7595
ralbertazzi wants to merge 1 commit intopython-poetry:masterfrom
ralbertazzi:patch-1

Conversation

@ralbertazzi
Copy link
Contributor

@ralbertazzi ralbertazzi commented Mar 3, 2023

Pull Request Check List

Resolves: #2415

This PR exploits the already existing Authenticator object with file-cache functionalities to create a dedicated url cache for URL dependencies. Such cache is used at dependency resolution time.

Note how the solution isn't still perfect, as there's an extra file transfer performed by Poetry from the cached file to a temporary directory. Still, by setting an appropriate chunk_size the performance improvement is extremely visible even with huge dependencies (dear PyTorch) where the "download' goes from minutes to a few seconds.

Also, there's quite a disable_cache forwarding around the codebase to make sure that the Authenticator's cache is disabled upon user request.

  • Added tests for changed code. -> Might need support for it. Also there's currently no tests for URL dependencies in test_provider.py, while there are tests for most other kinds (git, file, ..)
  • [ ] Updated documentation for changed code. Not needed IMO

with tempfile.TemporaryDirectory() as temp_dir:
dest = Path(temp_dir) / file_name
download_file(url, dest)
download_file(url, dest, session=self._url_authenticator, chunk_size=1024 * 1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increased chunk_size is needed to ensure a fast read from the file cache when there's a cache hit. The default chunk_size = 1024 would make the read pretty slow

self._direct_origin_packages: dict[str, Package] = {}
self._locked: dict[NormalizedName, list[DependencyPackage]] = defaultdict(list)
self._use_latest: Collection[NormalizedName] = []
self._url_authenticator = Authenticator(cache_id="url")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that there should be no repository called "url". If we think this could cause potential problems we can also rename it to _url

@ralbertazzi ralbertazzi force-pushed the patch-1 branch 3 times, most recently from 564b628 to dfefe2e Compare March 3, 2023 11:23
@radoering
Copy link
Member

As already mentioned by dimbleby in the related issue and the other (closed) PR, I also think that using the artifacts cache is the way to go. That avoids copying to a temporary directory and not only reuses the cache during dependency resolution but also for installation.

If anybody is motivated enough, I think the steps should be as follows:

  1. make get_cached_archives_for_link() independent from interpreter_name and interpreter_version
  2. add an artifacts_cache_directory property to config.py analoguous to repository_cache_directory()
  3. move get_cached_archives_for_link() and get_cache_directory_for_link() from chef.py to utils/cache.py
  4. adapt get_package_from_url() in provider.py to use the artifacts cache

If you want to split it up, steps 1-3 can be done in a first PR (refactoring only) and step 4 in a follow up PR (enhancement).

@ralbertazzi
Copy link
Contributor Author

Hi @radoering , I tried to address your first 3 steps in this PR ;) #7621

@ralbertazzi
Copy link
Contributor Author

Closed in favour of #7621

@ralbertazzi ralbertazzi deleted the patch-1 branch March 13, 2023 17:18
@github-actions
Copy link

github-actions bot commented Mar 3, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poetry downloading same wheels multiple times within a single invocation

2 participants