Conversation
Member
Author
|
Removed charset information from the docs in 4a871c1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This deprecates everywhere that accepted bytes when str was the correct type. It also deprecates the
charsetanderrorsparameters throughout the code. Closes #2602, which has more of my reasoning and notes.Here are the modern standards, which all strongly suggest or require UTF-8.
In general, I looked for everywhere that was using instance checks, encoding, decoding, or internal helper functions. Many places were not using them consistently already. The modern HTML, URL, and HTTP standards all strongly suggest or require using UTF-8. Despite that, browsers will still accept HTML documents with other encodings, and send data from such documents in that encoding. However, 98% of websites use UTF-8, and as long as we send UTF-8 documents we will receive UTF-8 data.
Ideally, this provides a speedup once all the deprecations are removed, but this is hard to test. Anecdotally, I talked to devs who noticed a corresponding slowdown when Python 3 support was initially added to the Python 2-only code. Many places were doing instance checks or excessive encoding/decoding multiple times while handling a request or creating a response.
These notes are in order that GitHub displays the changes in the diff, except for the first one which is probably the biggest change.
Request.charset,Request.encoding_errors,Request.url_charset,Response.charset, andClient.charsetattributes are deprecated. Request body, form, and cookie data will always be decoded using UTF-8. Response body data will always be encoded using UTF-8. That doesn't mean it's impossible to send or receive non-UTF-8 data, since it's still possible to get and set bytes directly._to_strand_to_bytesfunctions, leftover from 2/3 compatibility._encode_idna, it could be replaced withstr.encode("idna")._decode_idnastill exists, only for use iniri_to_uri. It allows leaving invalid segments IDNA-encoded. I added a fast path where it tries to decode the full host first, before falling back to decoding each segment.as_bytesparameter when getting header values is deprecated, values are always returned as strings.multipart/form-data, and some headers, have a way to specify a charset for individual fields. Both these parsers only allow ASCII, UTF-8, and ISO-8859-1.request.args, andapplication/x-www-form-urlencoded, keep invalid bytes percent encoded rather than replacing them with a placeholder character.Map.charsetis deprecated. Technically, it allowed building URLs that percent-encoded values as something besides UTF-8 bytes. However, Flask never exposed this option, so the vast, vast majority of devs have never been able to build routes with non-UTF-8 encoding anyway, even if they changed the response charset.