New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-96670: Raise SyntaxError when parsing NULL bytes #97594
Conversation
pablogsal
commented
Sep 27, 2022
•
edited by bedevere-bot
edited by bedevere-bot
- Issue: mishandling of c-strings in parser #96670
97a8f83
to
cb89392
Compare
Wow, that's some error. Let me clean that up |
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
|
Ugh, force-push. :-( |
|
I was going to ask "doesn't this break things for encodings where NULLs are normal, like UTF-16", but then realized we never supported that anyway: PEP 253: "It does not include encodings which use two or more bytes for all characters like e.g. UTF-16. The reason for this is to keep the encoding detection algorithm in the tokenizer simple." In that sense, we get a better error message for UTF-16 files with this PR than before. Prior, it would silently fail (behave as if an empty file), and with this PR: |
|
Landing, thank you everyone for your great review and comments! And also thank @mdboom for checking the UTF-16, that was a good consideration :) P.S. Sorry for the force push :S |