GH-78319: Stop sending the UTF8 marker when appending messages#107290
Open
arnt wants to merge 1 commit intopython:mainfrom
Open
GH-78319: Stop sending the UTF8 marker when appending messages#107290arnt wants to merge 1 commit intopython:mainfrom
arnt wants to merge 1 commit intopython:mainfrom
Conversation
…to a mailbox. The UTF8 marker is defined in RFC 6855 and tells the server that the message being appended contains UTF8 addresses, an unencoded UTF8 subject, etc. However, if a client appends a message containing UTF8 addresses but without that marker, the bytes can only be parsed as UTF8 because that's the only RFC-compliant way to parse those bytes. RFC 6855 says clients MUST send the UTF8 marker. Due to an accidental discrepancy, RFC 9051 (IMAP4rev2) does not contain that marker. IMAP4rev2 was intended to be upwardly compatible with RFC 6855, but this problem broke that. This has no ill effects, since the marker does not change the message's meaning. While investigating the problem, I noticed that Python uses the marker incorrectly: Python uses it to mark ALL messages if UTF8=ACCEPT support has been enabled, not just ones that contain UTF8 addresses. The best way forward appear to be using the syntax defined in RFC 9051 and publishing a revision to RFC 6855, so this change modifies imaplib to match RFC 9051. FWIW JMAP is like IMAP4rev2 in this case; UTF8 is just there, without any marker. Also, neither UTF8=ACCEPT, IMAP4rev2 or JMAP provide any way to learn whether a message was stored with or without the marker. This quasi-accidentally solves python#78319 by removing the case that broke.
|
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
Author
|
This seems small enough to not require a NEWS entry, I'd say. |
Author
|
I submitted the internet-draft to supersede RFC 6855 now and expect to get it to RFC fairly quickly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The UTF8 marker is defined in RFC 6855 and tells the server that the message being appended contains UTF8 addresses, an unencoded UTF8 subject, etc. However, if a client appends a message containing UTF8 addresses but without that marker, the bytes can only be parsed as UTF8 because that's the only RFC-compliant way to parse those bytes.
RFC 6855 says clients MUST send the UTF8 marker.
Due to an accidental discrepancy, RFC 9051 (IMAP4rev2) does not contain that marker. IMAP4rev2 was intended to be upwardly compatible with RFC 6855, but this problem broke that. This has no ill effects, since the marker does not change the message's meaning.
While investigating the problem, I noticed that Python uses the marker incorrectly: Python uses it to mark ALL messages if UTF8=ACCEPT support has been enabled, not just ones that contain UTF8 addresses.
The best way forward appear to be using the syntax defined in RFC 9051 and publishing a revision to RFC 6855, so this change modifies imaplib to match RFC 9051.
FWIW JMAP is like IMAP4rev2 in this case; UTF8 is just there, without any marker. Also, neither UTF8=ACCEPT, IMAP4rev2 or JMAP provide any way to learn whether a message was stored with or without the marker.
This quasi-accidentally solves #78319 by removing the case that broke.