gh-96670: Raise SyntaxError when parsing NULL bytes #97594

pablogsal · 2022-09-27T11:00:39Z

Issue: mishandling of c-strings in parser #96670

gvanrossum

What's up with python3.12.abi.new having 22,644 additions?

gvanrossum

LG except for one word in an error message, I think.

Parser/tokenizer.c

pablogsal · 2022-09-27T18:13:39Z

What's up with python3.12.abi.new having 22,644 additions?

Wow, that's some error. Let me clean that up

Objects/fileobject.c

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

gvanrossum · 2022-09-27T20:34:16Z

Ugh, force-push. :-(

gvanrossum

I see nothing else wrong, let's wait for Lysandros' LGTM.

lysnikolaou

LGTM as well!

mdboom · 2022-09-27T21:22:56Z

I was going to ask "doesn't this break things for encodings where NULLs are normal, like UTF-16", but then realized we never supported that anyway:

PEP 253: "It does not include encodings which use two or more bytes for all characters like e.g. UTF-16. The reason for this is to keep the encoding detection algorithm in the tokenizer simple."

In that sense, we get a better error message for UTF-16 files with this PR than before. Prior, it would silently fail (behave as if an empty file), and with this PR:

  File "x.py", line 1
    #
SyntaxError: source code cannot contain null bytes

pablogsal · 2022-09-27T22:25:39Z

Landing, thank you everyone for your great review and comments! And also thank @mdboom for checking the UTF-16, that was a good consideration :)

P.S. Sorry for the force push :S

pablogsal requested a review from lysnikolaou as a code owner Sep 27, 2022

bedevere-bot added the awaiting core review label Sep 27, 2022

pablogsal requested a review from gvanrossum Sep 27, 2022

pablogsal force-pushed the gh-96670 branch 5 times, most recently from 97a8f83 to cb89392 Compare Sep 27, 2022

gvanrossum reviewed Sep 27, 2022

View changes

Parser/tokenizer.c Outdated Show resolved Hide resolved

pablogsal force-pushed the gh-96670 branch from cb89392 to 490f5bd Compare Sep 27, 2022

pablogsal requested a review from gvanrossum Sep 27, 2022

lysnikolaou reviewed Sep 27, 2022

View changes

Objects/fileobject.c Show resolved Hide resolved

pablogsal force-pushed the gh-96670 branch from 490f5bd to 4b1105c Compare Sep 27, 2022

pythongh-96670: Raise SyntaxError when parsing NULL bytes

e10909d

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal force-pushed the gh-96670 branch from 4b1105c to e10909d Compare Sep 27, 2022

gvanrossum reviewed Sep 27, 2022

View changes

lysnikolaou approved these changes Sep 27, 2022

View changes

bedevere-bot added awaiting merge and removed awaiting core review labels Sep 27, 2022

pablogsal merged commit aab01e3 into python:main Sep 27, 2022
14 checks passed

pablogsal deleted the gh-96670 branch Sep 27, 2022

bedevere-bot removed the awaiting merge label Sep 27, 2022

vstinner mentioned this pull request Sep 28, 2022

mishandling of c-strings in parser #96670

Closed

gh-96670: Raise SyntaxError when parsing NULL bytes #97594

gh-96670: Raise SyntaxError when parsing NULL bytes #97594

pablogsal commented Sep 27, 2022 •

edited by bedevere-bot

gvanrossum left a comment

gvanrossum left a comment

pablogsal commented Sep 27, 2022

gvanrossum commented Sep 27, 2022

gvanrossum left a comment

lysnikolaou left a comment

mdboom commented Sep 27, 2022 •

edited

pablogsal commented Sep 27, 2022

gh-96670: Raise SyntaxError when parsing NULL bytes #97594

gh-96670: Raise SyntaxError when parsing NULL bytes #97594

Conversation

pablogsal commented Sep 27, 2022 • edited by bedevere-bot

gvanrossum left a comment

gvanrossum left a comment

pablogsal commented Sep 27, 2022

gvanrossum commented Sep 27, 2022

gvanrossum left a comment

lysnikolaou left a comment

mdboom commented Sep 27, 2022 • edited

pablogsal commented Sep 27, 2022

pablogsal commented Sep 27, 2022 •

edited by bedevere-bot

mdboom commented Sep 27, 2022 •

edited