RFC6874 updates RFC3986 URI Syntax to support IPv6 Zone Identifiers. Zone identifiers are required when working with link local IPv6 addresses on a machine with multiple network adapters.
RFC6874 updates the URI ABNF to include:
IP-literal = "[" ( IPv6address / IPv6addrz / IPvFuture ) "]"
ZoneID = 1*( unreserved / pct-encoded )
IPv6addrz = IPv6address "%25" ZoneID
Windows systems use integers for the zone identifiers (Linux systems often use interface names instead).
When performing a get request to a link local address with zone ID "37" formatted as per RFC6874 (e.g. http://[fe80::1234%2537]) the percent encoding seems to be interpreted incorrectly and attempts to access fe80::12347 which is an invalid IPv6 address.
Expected Result
Get request to "fe80::1234%37" performed correctly
Actual Result
urlllib3 LocationParseError and requests.exceptions.InvalidURL exceptions are generated.
>>> r = requests.get("http://[fe80::1234%2537]")
Traceback (most recent call last):
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 412, in send
conn = self.get_connection(request.url, proxies)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 315, in get_connection
conn = self.poolmanager.connection_from_url(url)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\poolmanager.py", line 297, in connection_from_url
u = parse_url(url)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\url.py", line 392, in parse_url
return six.raise_from(LocationParseError(source_url), None)
File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: http://[fe80::12347]/
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 414, in send
raise InvalidURL(e, request=request)
requests.exceptions.InvalidURL: Failed to parse: http://[fe80::12347]/
It seems that part of the zone ID is concatenated onto the IPv6 address and the % removed altogther.
Reproduction Steps
import requests
r = requests.get("http://[fe80::1234%2537]")
Note that the zone identifiers vary between PCs/OSs and sometimes over reboots, it is unlikely that 37 will be a valid Zone ID on most systems. However the connection should fail with "[WinError 10049] The requested address is not valid in its context" rather than
Other Notes
The correct URL can be accesses by escaping the Zone ID twice. e.g:
import requests
r = requests.get("http://[fe80::1234%252537]")
Will attempt to access the correct address fe80::1234%37 perhaps %encoding is being performed twice?
Exact behaviour depends on the specific zone ID used, for instance some zone IDs work with or without the percent encoded % character:
- r = requests.get("http://[fe80::1234%1]")
- r = requests.get("http://[fe80::1234%251]")
- r = requests.get("http://[fe80::1234%25251]")
All attempt a get request from fe80::1234%1
urllib3 behaviour
urllib3 seems to perform in accordance with RFC6874:
>>> import urllib3
>>> urllib3.util.parse_url("http://[fe80::1234%2537]")
Url(scheme='http', auth=None, host='[fe80::1234%37]', port=None, path=None, query=None, fragment=None)
System Information
$ python -m requests.help
{
"chardet": {
"version": "4.0.0"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "2.10"
},
"implementation": {
"name": "CPython",
"version": "3.9.2"
},
"platform": {
"release": "10",
"system": "Windows"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.25.1"
},
"system_ssl": {
"version": "1010109f"
},
"urllib3": {
"version": "1.26.3"
},
"using_pyopenssl": false
}
RFC6874 updates RFC3986 URI Syntax to support IPv6 Zone Identifiers. Zone identifiers are required when working with link local IPv6 addresses on a machine with multiple network adapters.
RFC6874 updates the URI ABNF to include:
Windows systems use integers for the zone identifiers (Linux systems often use interface names instead).
When performing a get request to a link local address with zone ID "37" formatted as per RFC6874 (e.g. http://[fe80::1234%2537]) the percent encoding seems to be interpreted incorrectly and attempts to access fe80::12347 which is an invalid IPv6 address.
Expected Result
Get request to "fe80::1234%37" performed correctly
Actual Result
urlllib3 LocationParseError and requests.exceptions.InvalidURL exceptions are generated.
It seems that part of the zone ID is concatenated onto the IPv6 address and the % removed altogther.
Reproduction Steps
Note that the zone identifiers vary between PCs/OSs and sometimes over reboots, it is unlikely that 37 will be a valid Zone ID on most systems. However the connection should fail with "[WinError 10049] The requested address is not valid in its context" rather than
Other Notes
The correct URL can be accesses by escaping the Zone ID twice. e.g:
Will attempt to access the correct address fe80::1234%37 perhaps %encoding is being performed twice?
Exact behaviour depends on the specific zone ID used, for instance some zone IDs work with or without the percent encoded % character:
All attempt a get request from fe80::1234%1
urllib3 behaviour
urllib3 seems to perform in accordance with RFC6874:
System Information