Skip to content

gh-85679: Recommend encoding="utf-8" in tutorial#91778

Merged
methane merged 2 commits into
python:mainfrom
methane:tutorial-utf8
May 2, 2022
Merged

gh-85679: Recommend encoding="utf-8" in tutorial#91778
methane merged 2 commits into
python:mainfrom
methane:tutorial-utf8

Conversation

@methane

@methane methane commented Apr 21, 2022

Copy link
Copy Markdown
Member

Fixes #85679

@methane methane added docs Documentation in the Doc dir skip news needs backport to 3.9 needs backport to 3.10 only security fixes labels Apr 21, 2022
@methane methane changed the title bpo-85679: Use encoding="utf-8" in tutorial gh-85679: Use encoding="utf-8" in tutorial Apr 21, 2022
Comment thread Doc/tutorial/inputoutput.rst Outdated
If *encoding* is not specified, the default is platform dependent
(see :func:`open`).
But passing ``encoding="utf-8"`` is highly recommended because
UTF-8 is the most commonly used encoding for now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general rule: don't start a sentence with "and, but, so, or then" words. and try not to end with "for now".

Explicitly passing ``encoding='utf-8'`` is recommended if that is what you need as it is the most common text encoding in the world and leaves no room for doubt about your code's intent.

perhaps.

@methane methane Apr 21, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tutorial and reader won't know what they need.
I want to teach that UTF-8 is the first choice.

How about this?

``encoding="utf-8"`` is recommended unless you need to use other encoding
because UTF-8 is the de-facto standard nowadays.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
Because UTF-8 is the modern de-facto standard, ``encoding="utf-8"`` is recommended unless you know that you need to use a different encoding.

Comment thread Doc/tutorial/inputoutput.rst Outdated
(see :func:`open`).
But passing ``encoding="utf-8"`` is highly recommended because
UTF-8 is the most commonly used encoding for now.
``'b'`` appended to the mode opens the file in :dfn:`binary mode`:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appending a ``'b'`` to the mode opens the file in :dfn:`binary mode`. Binary mode data is read and written as ``bytes`` objects without use of a codec.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use "encoding" instead of "codec". I don't see "codec" used anywhere else in this file.

@methane methane Apr 21, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't rewrite this paragraph at all. I just reflow it.)

How about this?

Appending a ``'b'`` to the mode opens the file in :dfn:`binary mode`.
Binary mode data is read and written as :class:`bytes` objects.
You can not specify *encoding* when opening file in binary mode.

Comment thread Doc/tutorial/inputoutput.rst Outdated

.. note::
JSON files must be encoded in UTF-8. Use ``encoding="utf-8"`` when opening
JSON file as :term:`text file` for both of reading and writing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a :term:`text file`. add the 'a' and no the "for reading and writing" text can go.

Comment thread Doc/tutorial/inputoutput.rst Outdated

:func:`open` returns a :term:`file object`, and is most commonly used with
two arguments: ``open(filename, mode)``.
two or three arguments: ``open(filename, mode, encoding=None)``

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like the encoding to be a keyword for readability so I'd word this similar to "two arguments, often with an encoding keyword when using a text mode" rather than including encoding in the number and mentioning two numbers. it feels more clear to me that way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we haven't described the "text mode" yet. It is described in below.

How about "two positional arguments and one keyword argument"?
Since binary file is rare than text file, we can focus on text file at this first open() example.

@bedevere-bot

Copy link
Copy Markdown

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@methane

methane commented Apr 25, 2022

Copy link
Copy Markdown
Member Author

I have made the requested changes; please review again

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from gpshead April 25, 2022 09:13
@methane methane changed the title gh-85679: Use encoding="utf-8" in tutorial gh-85679: Recommend encoding="utf-8" in tutorial May 2, 2022
@methane methane merged commit 614420d into python:main May 2, 2022
@methane methane deleted the tutorial-utf8 branch May 2, 2022 08:25
@miss-islington

Copy link
Copy Markdown
Contributor

Thanks @methane for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒⛏🤖

@bedevere-bot

Copy link
Copy Markdown

GH-92133 is a backport of this pull request to the 3.10 branch.

@bedevere-bot

Copy link
Copy Markdown

GH-92134 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 2, 2022
)

(cherry picked from commit 614420d)

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
miss-islington added a commit that referenced this pull request May 2, 2022
(cherry picked from commit 614420d)

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
miss-islington added a commit that referenced this pull request May 2, 2022
(cherry picked from commit 614420d)

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
)

(cherry picked from commit 614420d)

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation in the Doc dir skip news

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use utf-8 in "Reading and Writing Files" tutorial.

5 participants