Skip to content

url_exists related improvements#34095

Merged
tgamblin merged 4 commits intospack:developfrom
haampie:fix/url-things
Nov 23, 2022
Merged

url_exists related improvements#34095
tgamblin merged 4 commits intospack:developfrom
haampie:fix/url-things

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Nov 23, 2022

For reasons beyond me Python thinks it's a great idea to upgrade HEAD
requests to GET requests when following redirects. So, this PR adds a
better HTTPRedirectHandler, and also moves some ad-hoc logic around
for dealing with disabling SSL certs verification.

Also, I'm stumped by the fact that Spack's url_exists does not use
HEAD requests at all, so in certain cases Spack awkwardly downloads
something first to see if it can download it, and then downloads it
again because it knows it can download it. So, this PR ensures that both
urllib and botocore use HEAD requests.

Finally, it also removes some things that were there to support currently
unsupported Python versions.

Notice that the HTTP spec section 10.3.2 just talks about how to deal
with POST request on redirect (whether to follow or not):

If the 301 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.

Note: When automatically redirecting a POST request after
receiving a 301 status code, some existing HTTP/1.0 user agents
will erroneously change it into a GET request.

Python has a comment about this, they choose to go with the "erroneous change".
But they then mess up the HEAD request while following the redirect, probably
because they were too busy discussing how to deal with POST.

See python/cpython#99731

@spackbot-app spackbot-app bot added core PR affects Spack core functionality fetching tests General test capability(ies) utilities labels Nov 23, 2022
@haampie haampie changed the title urlopen related improvements url_exists related improvements Nov 23, 2022
@haampie haampie force-pushed the fix/url-things branch 5 times, most recently from 76aaa85 to a1a2f98 Compare November 23, 2022 16:33
For reasons beyond me Python thinks it's a great idea to upgrade HEAD
requests to GET requests when following redirects. So, this PR adds a
better `HTTPRedirectHandler`, and also moves some ad-hoc logic around
for dealing with disable ssl cert verification.

Also, I'm stumped by the fact that Spack's `url_exists` does not use
HEAD requests at all, so in certain cases Spack awkwardly downloads
something first to see if it can download it, and then downloads it
because it know it can download it. So, this PR ensures that both urllib
and botocore use HEAD requests.
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 23, 2022

@spackbot fix style

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Nov 23, 2022

Let me see if I can fix that for you!

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Nov 23, 2022

I was able to run spack style --fix for you!

spack style --fix
==> Running style checks on spack
  selected: isort, mypy, black, flake8
==> Modified files
  lib/spack/spack/test/web.py
  lib/spack/spack/util/web.py
==> Running isort checks
Fixing /tmp/tmp3yze3ko_/spack/lib/spack/spack/util/web.py
  isort checks were clean
==> Running mypy checks
lib/spack/llnl/util/lang.py:1064: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
lib/spack/llnl/util/lock.py:84: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
lib/spack/spack/util/timer.py:68: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
lib/spack/spack/bootstrap.py:87: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
Success: no issues found in 565 source files
  mypy checks were clean
==> Running black checks
All done! ✨ 🍰 ✨
2 files left unchanged.
  black checks were clean
==> Running flake8 checks
  flake8 checks were clean
==> spack style checks were clean
Keep in mind that I cannot fix your flake8 or mypy errors, so if you have any you'll need to fix them and update the pull request. If I was able to push to your branch, if you make further changes you will need to pull from your updated branch before pushing again.

I've updated the branch with style fixes.

@tgamblin tgamblin enabled auto-merge (squash) November 23, 2022 19:00
@tgamblin tgamblin merged commit d06fd26 into spack:develop Nov 23, 2022
@haampie haampie deleted the fix/url-things branch November 23, 2022 21:37
haampie added a commit that referenced this pull request Nov 26, 2022
haampie pushed a commit that referenced this pull request Nov 26, 2022
This reverts commit d06fd26.

The problem is that Bitbucket's API forwards download requests to an S3 bucket using a temporary URL. This URL includes a signature for the request, which embeds the HTTP verb. That means only GET requests are allowed, and HEAD requests would fail verification, leading to 403 erros. The same is observed when using `curl -LI ...`
amd-toolchain-support pushed a commit to amd-toolchain-support/spack that referenced this pull request Feb 16, 2023
For reasons beyond me Python thinks it's a great idea to upgrade HEAD
requests to GET requests when following redirects. So, this PR adds a
better `HTTPRedirectHandler`, and also moves some ad-hoc logic around
for dealing with disabling SSL certs verification.

Also, I'm stumped by the fact that Spack's `url_exists` does not use
HEAD requests at all, so in certain cases Spack awkwardly downloads
something first to see if it can download it, and then downloads it
again because it knows it can download it. So, this PR ensures that both
urllib and botocore use HEAD requests.

Finally, it also removes some things that were there to support currently
unsupported Python versions.

Notice that the HTTP spec [section 10.3.2](https://datatracker.ietf.org/doc/html/rfc2616.html#section-10.3.2) just talks about how to deal
with POST request on redirect (whether to follow or not):

>   If the 301 status code is received in response to a request other
>   than GET or HEAD, the user agent MUST NOT automatically redirect the
>   request unless it can be confirmed by the user, since this might
>   change the conditions under which the request was issued.

>   Note: When automatically redirecting a POST request after
>   receiving a 301 status code, some existing HTTP/1.0 user agents
>   will erroneously change it into a GET request.

Python has a comment about this, they choose to go with the "erroneous change".
But they then mess up the HEAD request while following the redirect, probably
because they were too busy discussing how to deal with POST.

See python/cpython#99731
amd-toolchain-support pushed a commit to amd-toolchain-support/spack that referenced this pull request Feb 16, 2023
This reverts commit d06fd26.

The problem is that Bitbucket's API forwards download requests to an S3 bucket using a temporary URL. This URL includes a signature for the request, which embeds the HTTP verb. That means only GET requests are allowed, and HEAD requests would fail verification, leading to 403 erros. The same is observed when using `curl -LI ...`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality fetching tests General test capability(ies) utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants