feature: cache downloaded wheel information#2276
Conversation
| def add_wheel_to_update_log(wheel, for_py_version, app_data): | ||
| embed_update_log = app_data.embed_update_log(wheel.distribution, for_py_version) | ||
| logging.debug("adding %s information to %s", wheel.name, embed_update_log.file) | ||
| u_log = UpdateLog.from_dict(embed_update_log.read()) | ||
| if any(version.filename == wheel.name for version in u_log.versions): | ||
| logging.warning("%s already present in %s", wheel.name, embed_update_log.file) | ||
| return | ||
| version = NewVersion(wheel.name, datetime.now(), release_date_for_wheel_path(wheel.path), "download") | ||
| u_log.versions.append(version) # always write at the end for proper updates | ||
| embed_update_log.write(u_log.to_dict()) | ||
|
|
||
|
|
There was a problem hiding this comment.
This has been added to periodic_update.py as it is the only place dealing with the embed_update_log (which is not used only for updates anymore).
| now = datetime.now() | ||
| if periodic: | ||
| source = "periodic" | ||
| # mark everything not updated manually as source "periodic" |
There was a problem hiding this comment.
All the modifications in _run_do_update are what lead me to bump the JSON embed format version.
| # we need to sort versions for the next update to work properly | ||
| u_log.versions.sort(reverse=True) |
There was a problem hiding this comment.
The log needs to be sorted. It was sorted by design before. That's not the case anymore.
There was a problem hiding this comment.
Can you explain why is no longer the case?
There was a problem hiding this comment.
This would remove the need to sort the list.
I don't mind sorting the log; why would you consider the other solution less risky?
There was a problem hiding this comment.
Can you explain why is no longer the case?
because other sources logs are always appended rather than inserted.
I don't mind sorting the log; why would you consider the other solution less risky?
I'm feeling a bit hesitant to rely on the tuple ordering given:
def as_version_tuple(version):
result = []
for part in version.split(".")[0:3]:
try:
result.append(int(part))
except ValueError:
break
if not result:
raise ValueError(version)
return tuple(result)
I feel that the inner try/catch is hiding something I did not think about.
There was a problem hiding this comment.
I did not think of pre-release versions... In this specific sample, it does not matter much but it's not sorted correctly.
{
"completed": "2022-01-01T21:48:50.557509Z",
"periodic": false,
"started": "2022-01-01T21:48:40.230835Z",
"versions": [
{
"filename": "pip-21.3.1-py3-none-any.whl",
"found_date": "2022-01-01T14:22:37.717326Z",
"release_date": "2021-10-22T15:57:09.000000Z",
"source": "manual"
},
{
"filename": "pip-20.3-py2.py3-none-any.whl",
"found_date": "2022-01-01T21:46:25.286240Z",
"release_date": "2020-11-30T12:47:49.000000Z",
"source": "manual"
},
{
"filename": "pip-20.0-py2.py3-none-any.whl",
"found_date": "2022-01-01T21:48:23.634459Z",
"release_date": "2020-01-21T11:42:56.000000Z",
"source": "manual"
},
{
"filename": "pip-20.3b1-py2.py3-none-any.whl",
"found_date": "2022-01-01T21:44:30.798536Z",
"release_date": "2020-10-31T19:25:12.000000Z",
"source": "manual"
}
]
}
Further-more, a manual/periodic update should never lead to defaulting to pre-release version. This promise is broken (at least for manual updates).
There was a problem hiding this comment.
Given an added test with a pre-release version downloaded as "pinned" version indeed broke manual updates, I went with the alternate implementation of _run_do_update which first splits the UpdateLog in 2 depending if it's coming from a previous update or another source.
|
Edit: this is used by the latest commit There's another way to do the update which might be a less bit risky for periodic updates:
This would remove the need to sort the list. |
|
Edit: this is no longer the case with the latest push. |
|
Writing things down makes me realize that there are still some things left to do so converting to draft. PS: while thinking about the time penalty I mentioned, I thought about caching PyPI requests on disk as well and use the |
Add downloaded wheel information in the relevant JSON embed file to prevent additional downloads of the same wheel. This requires to bump the JSON embed file version.
|
I rewrote the |
|
|
||
|
|
||
| def test_download_manual_stop_at_first_usable(tmp_path, mocker, freezer): | ||
| def test_download_periodic_stop_at_first_usable2(tmp_path, mocker, freezer): |
There was a problem hiding this comment.
there's already a test_download_periodic_stop_at_first_usable test and I didn't see how to do this as a parametrize test easily so adding a suffix 2 on the second one.
There was a problem hiding this comment.
Instead of magic numbers, I generally add to the test name what differentiates them 🤔 for future reference.
Signed-off-by: Bernát Gábor <gaborjbernat@gmail.com>
Add downloaded wheel information in the relevant JSON embed file to prevent additional downloads of the same wheel.
This requires to bump the JSON embed file version once again as I was lacking some tests cases in #2273.
I will add some inline comments for some choices that I made.
Fixes #2268
Thanks for contributing, make sure you address all the checklists (for details on how see
development documentation)!
tox -e fix_lint)docs/changelogfolder