Skip to content

Conversation

@tipabu
Copy link
Contributor

@tipabu tipabu commented Jun 3, 2019

Previously, when http.client tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name, email.feedparser would assume that the non-compliant header must be part of a message body and abort parsing. However, http.client already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing like Content-Length and Transfer-Encoding.

In the long-long ago, this parsing was handled by the rfc822 module, which didn't care about which bytes were in the header as long as there was a colon in the line.

Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to True to minimize the possibility of breaking other callers. In http.client, which already knows where the headers end and body begins, use False.

Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.

https://bugs.python.org/issue37093

Copy link
Contributor

@ZackerySpytz ZackerySpytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please limit lines to 79 characters (PEP 8).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a litany of line-length violations in Lib/http/client.py and Lib/test/test_httplib.py but I think now at least I'm not making things any worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:mod:`http.client`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Previously, when http.client tried to parse a response from an
out-of-spec server that sent a header with a non-ASCII name,
email.feedparser would assume that the non-compliant header must be
part of a message body and abort parsing. However, http.client already
determined the boundary between headers and body and only passed the
headers to the parser. As a result, any headers after the first
non-compliant one would be silently (!) ignored. This could include
headers important for message framing like Content-Length and
Transfer-Encoding.

In the long-long ago, this parsing was handled by the rfc822 module,
which didn't care about which bytes were in the header as long as there
was a colon in the line.

Now, add an optional argument to the email parsers to decide whether to
require strict RFC-compliant header names. Default this to True to
minimize the possibility of breaking other callers. In http.client,
which already knows where the headers end and body begins, use False.

Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping
with how header values are decoded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants