Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tokenizer API raise SyntaxError on 3.12 where it emits tokens on 3.11 #105238

Open
Carreau opened this issue Jun 2, 2023 · 0 comments
Open

Comments

@Carreau
Copy link
Contributor

Carreau commented Jun 2, 2023

It looks like the new 3.12 tokenizer is quite more strict in raising SyntaxError where in 3.11 and before it would return tokens and ENDMARKER.

This is a bit problematic as it can be common to try to at least tokenize unfinished or non valid Python, for example in IPython. Typically see the errors in ipython/ipython#14091.

We use incomplete tokenizing for a bunch of things:

In the REPL:

  • is what the user incomplete but valid ?
    • Yes, we insert a new line and have the user keep typing.
    • No, continue with compile() and show user the error.
  • Is the tokenisation valid, but does not compile ?
    • Yes,
    • Could it be bash like ls something -> Yes, detect it and do the right things.
  • Detect invalid unicode better.

Most of those are now SyntaxError in the new tokenizer. I'm actually not even sure the old tokenizer used to raise SyntaxError at all, it seem it was only raise tokenize.TokenError before.

fails_in312 = [
    "a = [1,\n2,",
    "1 \n\n1 \ ",
    "x =(1+",
    "import os, \\",
    'a="""',
    "\xc3\xa9\n",
]


for f in fails_in312:
    from tokenize import generate_tokens

    list(generate_tokens(iter(f).__next__)) # ok on 3.11 raises SyntaxError on 3.12
...

(a bit hand waved as I was not yet able to get 3.12 on my local machine, but all above example should be from IPython failing CI on 3.12).

I'm assuming those changes are on purpose, due to the new parser for f-strings. I was just noting on social media that I don't know if/how I'm going to handle those breakage, and even wether it's possible to do it only on IPython side. I was still advised to open an issue anyway as it is a change of API that breaks downstream, so here is it.

Still it would be great for the tokenizer to only tokenize, and the parser the SyntaxError as there are valid reasons to tokenize incomplete/invalid python.

Your environment

  • CPython versions tested on: 3.12-dev from GitHub action.
  • Operating system and architecture: linux.

Thanks

@Carreau Carreau added the type-bug An unexpected behavior, bug, or error label Jun 2, 2023
@pablogsal pablogsal removed the type-bug An unexpected behavior, bug, or error label Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants