Skip to content

RFC6874 IPv6 Zone Identifiers in urls not parsed correctly #5775

@ph1l1p139

Description

@ph1l1p139

RFC6874 updates RFC3986 URI Syntax to support IPv6 Zone Identifiers. Zone identifiers are required when working with link local IPv6 addresses on a machine with multiple network adapters.

RFC6874 updates the URI ABNF to include:

IP-literal = "[" ( IPv6address / IPv6addrz / IPvFuture ) "]"
ZoneID = 1*( unreserved / pct-encoded )
IPv6addrz = IPv6address "%25" ZoneID

Windows systems use integers for the zone identifiers (Linux systems often use interface names instead).

When performing a get request to a link local address with zone ID "37" formatted as per RFC6874 (e.g. http://[fe80::1234%2537]) the percent encoding seems to be interpreted incorrectly and attempts to access fe80::12347 which is an invalid IPv6 address.

Expected Result

Get request to "fe80::1234%37" performed correctly

Actual Result

urlllib3 LocationParseError and requests.exceptions.InvalidURL exceptions are generated.

>>> r = requests.get("http://[fe80::1234%2537]")
Traceback (most recent call last):
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 412, in send
    conn = self.get_connection(request.url, proxies)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 315, in get_connection
    conn = self.poolmanager.connection_from_url(url)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\poolmanager.py", line 297, in connection_from_url
    u = parse_url(url)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\url.py", line 392, in parse_url
    return six.raise_from(LocationParseError(source_url), None)
  File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: http://[fe80::12347]/

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\username_removed\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 414, in send
    raise InvalidURL(e, request=request)
requests.exceptions.InvalidURL: Failed to parse: http://[fe80::12347]/

It seems that part of the zone ID is concatenated onto the IPv6 address and the % removed altogther.

Reproduction Steps

import requests
r = requests.get("http://[fe80::1234%2537]")

Note that the zone identifiers vary between PCs/OSs and sometimes over reboots, it is unlikely that 37 will be a valid Zone ID on most systems. However the connection should fail with "[WinError 10049] The requested address is not valid in its context" rather than

Other Notes

The correct URL can be accesses by escaping the Zone ID twice. e.g:

import requests
r = requests.get("http://[fe80::1234%252537]")

Will attempt to access the correct address fe80::1234%37 perhaps %encoding is being performed twice?

Exact behaviour depends on the specific zone ID used, for instance some zone IDs work with or without the percent encoded % character:

  • r = requests.get("http://[fe80::1234%1]")
  • r = requests.get("http://[fe80::1234%251]")
  • r = requests.get("http://[fe80::1234%25251]")

All attempt a get request from fe80::1234%1

urllib3 behaviour

urllib3 seems to perform in accordance with RFC6874:

>>> import urllib3
>>> urllib3.util.parse_url("http://[fe80::1234%2537]")
Url(scheme='http', auth=None, host='[fe80::1234%37]', port=None, path=None, query=None, fragment=None)

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "4.0.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.10"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.2"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.25.1"
  },
  "system_ssl": {
    "version": "1010109f"
  },
  "urllib3": {
    "version": "1.26.3"
  },
  "using_pyopenssl": false
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions