-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
bpo-37093: Allow http.client to parse non-ASCII header names #13788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ZackerySpytz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the documentation.
Lib/email/parser.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please limit lines to 79 characters (PEP 8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a litany of line-length violations in Lib/http/client.py and Lib/test/test_httplib.py but I think now at least I'm not making things any worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:mod:`http.client`There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Previously, when http.client tried to parse a response from an out-of-spec server that sent a header with a non-ASCII name, email.feedparser would assume that the non-compliant header must be part of a message body and abort parsing. However, http.client already determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing like Content-Length and Transfer-Encoding. In the long-long ago, this parsing was handled by the rfc822 module, which didn't care about which bytes were in the header as long as there was a colon in the line. Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to True to minimize the possibility of breaking other callers. In http.client, which already knows where the headers end and body begins, use False. Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.
Previously, when
http.clienttried to parse a response from an out-of-spec server that sent a header with a non-ASCII name,email.feedparserwould assume that the non-compliant header must be part of a message body and abort parsing. However,http.clientalready determined the boundary between headers and body and only passed the headers to the parser. As a result, any headers after the first non-compliant one would be silently (!) ignored. This could include headers important for message framing likeContent-LengthandTransfer-Encoding.In the long-long ago, this parsing was handled by the
rfc822module, which didn't care about which bytes were in the header as long as there was a colon in the line.Now, add an optional argument to the email parsers to decide whether to require strict RFC-compliant header names. Default this to
Trueto minimize the possibility of breaking other callers. Inhttp.client, which already knows where the headers end and body begins, useFalse.Note that the non-ASCII names will be decoded as ISO-8859-1 in keeping with how header values are decoded.
https://bugs.python.org/issue37093