feat: cache url dependencies during dependency resolution#7595
feat: cache url dependencies during dependency resolution#7595ralbertazzi wants to merge 1 commit intopython-poetry:masterfrom
Conversation
src/poetry/puzzle/provider.py
Outdated
| with tempfile.TemporaryDirectory() as temp_dir: | ||
| dest = Path(temp_dir) / file_name | ||
| download_file(url, dest) | ||
| download_file(url, dest, session=self._url_authenticator, chunk_size=1024 * 1024) |
There was a problem hiding this comment.
The increased chunk_size is needed to ensure a fast read from the file cache when there's a cache hit. The default chunk_size = 1024 would make the read pretty slow
src/poetry/puzzle/provider.py
Outdated
| self._direct_origin_packages: dict[str, Package] = {} | ||
| self._locked: dict[NormalizedName, list[DependencyPackage]] = defaultdict(list) | ||
| self._use_latest: Collection[NormalizedName] = [] | ||
| self._url_authenticator = Authenticator(cache_id="url") |
There was a problem hiding this comment.
This assumes that there should be no repository called "url". If we think this could cause potential problems we can also rename it to _url
564b628 to
dfefe2e
Compare
|
As already mentioned by dimbleby in the related issue and the other (closed) PR, I also think that using the artifacts cache is the way to go. That avoids copying to a temporary directory and not only reuses the cache during dependency resolution but also for installation. If anybody is motivated enough, I think the steps should be as follows:
If you want to split it up, steps 1-3 can be done in a first PR (refactoring only) and step 4 in a follow up PR (enhancement). |
|
Hi @radoering , I tried to address your first 3 steps in this PR ;) #7621 |
|
Closed in favour of #7621 |
|
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Pull Request Check List
Resolves: #2415
This PR exploits the already existing
Authenticatorobject with file-cache functionalities to create a dedicatedurlcache for URL dependencies. Such cache is used at dependency resolution time.Note how the solution isn't still perfect, as there's an extra file transfer performed by Poetry from the cached file to a temporary directory. Still, by setting an appropriate
chunk_sizethe performance improvement is extremely visible even with huge dependencies (dear PyTorch) where the "download' goes from minutes to a few seconds.Also, there's quite a
disable_cacheforwarding around the codebase to make sure that the Authenticator's cache is disabled upon user request.test_provider.py, while there are tests for most other kinds (git, file, ..)[ ] Updated documentation for changed code.Not needed IMO