bpo-32236: Issue RuntimeWarning if buffering=1 for open() in binary mode#4842
bpo-32236: Issue RuntimeWarning if buffering=1 for open() in binary mode#4842vstinner merged 12 commits intopython:masterfrom
Conversation
If buffering=1 is specified for open() in binary mode, it is silently treated as buffering=-1 (i.e., the default buffer size). Coupled with the fact that line buffering is always supported in Python 2, such behavior caused several issues (e.g., bpo-10344, bpo-21332). Raise ValueError on attempts to use line buffering in binary mode to prevent possible bugs and expose existing buggy code.
gpshead
left a comment
There was a problem hiding this comment.
The subprocess related changes are correct, others should review the other changes. Though at first glance I believe those to also be correct.
vstinner
left a comment
There was a problem hiding this comment.
Please add a NEWS entry to mention that open() now emits a warning if the file is opened in binary mode with buffering=1. Moreover, the NEWS entry should mention that the change fixes a performance issue in subprocess.Popen if buffering=1: the underlying binary file is no longer opened with inefficient buffering=1, if I understood correctly.
| try: | ||
| f = self.open(TESTFN, 'wb', s) | ||
| f.write(str(s).encode("ascii")) | ||
| f.close() |
There was a problem hiding this comment.
what's the point of calling close() twice here? with should be used, no?
There was a problem hiding this comment.
This is just a refactor of an existing test.
| except OSError as msg: | ||
| self.fail('error setting buffer size %d: %s' % (s, str(msg))) | ||
| self.assertEqual(d, s) | ||
| for s in (-1, 0, 512): |
There was a problem hiding this comment.
Would it be possible to check that these calls don't emit RuntimeWarning?
There was a problem hiding this comment.
I've added the check.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Lib/_pyio.py
Outdated
| raise ValueError("binary mode doesn't take a newline argument") | ||
| if binary and buffering == 1: | ||
| import warnings | ||
| warnings.warn("line buffering is not supported in binary mode", |
There was a problem hiding this comment.
It's not necessary obvious why the warning talks about line buffering, so I would be a bit more explicit: "line buffering (buffering = 1) isn't supported in binary mode, file will be fully buffered."
There was a problem hiding this comment.
OK, I've clarified the message as you suggest.
| try: | ||
| f = self.open(TESTFN, 'wb', s) | ||
| f.write(str(s).encode("ascii")) | ||
| f.close() |
There was a problem hiding this comment.
This is just a refactor of an existing test.
Lib/test/test_io.py
Outdated
| self.large_file_ops(f) | ||
|
|
||
| def _checked_open(self, path, mode, bufsize, under_check=False): | ||
| if not under_check and bufsize == 1: |
There was a problem hiding this comment.
What is under_check for? It doesn't seem to be ever True.
There was a problem hiding this comment.
This function was a conditional context manager, and under_check is set to True in the recursive call. But I've removed that function anyway to have only one place with warning check (in test_file.py).
| for s in (-1, 0, 512): | ||
| self._checkBufferSize(s) | ||
|
|
||
| # test that attempts to use line buffering in binary mode cause |
There was a problem hiding this comment.
No need to test this both here and in test_io.py, IMHO.
There was a problem hiding this comment.
OK, I've preserved only one check here. I think that test_with_open in test_io.py wasn't a really good place for that check after all.
|
@gpshead, @vstinner, @pitrou Thank you for your reviews! I have made the requested changes; please review again. Also, should the docs be changed to reflect that |
|
@vstinner Regarding the NEWS, there was no performance issue with |
|
@vstinner Sorry, I misread your comment -- it was about |
My previous understanding of behavior of check_no_resource_warning(), which check_no_warnings() is based on, was incorrect.
Lib/_pyio.py
Outdated
| if binary and buffering == 1: | ||
| import warnings | ||
| warnings.warn("line buffering (buffering=1) isn't supported in binary " | ||
| "mode, file will be fully buffered", RuntimeWarning, 2) |
There was a problem hiding this comment.
There is no such thing as "fully buffered". I propose to also say "the default size will be used" here.
Extract of the code:
if buffering == 1 or buffering < 0 and raw.isatty():
buffering = -1
line_buffering = True
if buffering < 0:
buffering = DEFAULT_BUFFER_SIZE
buffering=1 becomes buffering=DEFAULT_BUFFER_SIZE if I understand correctly.
Lib/subprocess.py
Outdated
| if self.text_mode: | ||
| if bufsize == 1: | ||
| line_buffering = True | ||
| # line buffering is not supported in binary mode |
There was a problem hiding this comment.
But here we are in text mode, no? The comment is misleading, I suggest to remove it.
Or say something like: "# Use the default buffer size for the binary buffered reader/writer".
There was a problem hiding this comment.
I was referring to the binary mode used for the underlying streams, and I didn't realize that it can be misleading, but now I see it. Changed to clearer wording, thanks!
Modules/_io/_iomodule.c
Outdated
| if (binary && buffering == 1) { | ||
| if (PyErr_WarnEx(PyExc_RuntimeWarning, | ||
| "line buffering (buffering=1) isn't supported in " | ||
| "binary mode, file will be fully buffered", 1) < 0) |
There was a problem hiding this comment.
Ditto: say something like "the default size will be used".
PEP 7: please add { ... } around the "goto error;". By the way, see https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/ :-)
There was a problem hiding this comment.
Yes, I'm aware of PEP 7 and goto fail bug, but I usually follow the rule of local consistency, and a similar code above doesn't use brackets. Now I've added brackets here as you suggested.
| @@ -0,0 +1,2 @@ | |||
| Warn that line buffering is not supported if :func:`open` is called with | |||
| binary mode and ``buffering=1``. | |||
There was a problem hiding this comment.
I would prefer to be more explicit and list more impact functions: open, io.open, codecs.open. Would it be possible to put functions first in the sentence? Something like: "open(), io.open(), codecs.open() now emit a warning if the file is opened in binary mode with buffering=1." So the reader directly gets the context.
There was a problem hiding this comment.
Some other functions are impacted too, like pathlib.Path.open() and tempfile.TemporaryFile() (and some other classes from tempfile). Do you think we should find and list all such functions here? Maybe we should just provide a couple of examples: io.open and the standard library functions relying on it (e.g., codecs.open, pathlib.Path.open) now issue a warning if a file is opened in binary mode with buffering=1.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
|
I ran the full test suite on this change using "PYTHONWARNINGS=error ./python -W error -m test -j0 -u all,-gui -r" to check if any stdlib module misuse open(). Good news: "Tests result: SUCCESS". |
If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).
Raise ValueError on attempts to use line buffering in binary mode to prevent
possible bugs and expose existing buggy code.
https://bugs.python.org/issue32236