New Tokenizer API raise SyntaxError on 3.12 where it emits tokens on 3.11 #105238

Carreau · 2023-06-02T13:02:23Z

It looks like the new 3.12 tokenizer is quite more strict in raising SyntaxError where in 3.11 and before it would return tokens and ENDMARKER.

This is a bit problematic as it can be common to try to at least tokenize unfinished or non valid Python, for example in IPython. Typically see the errors in ipython/ipython#14091.

We use incomplete tokenizing for a bunch of things:

In the REPL:

is what the user incomplete but valid ?
- Yes, we insert a new line and have the user keep typing.
- No, continue with compile() and show user the error.
Is the tokenisation valid, but does not compile ?
- Yes,
- Could it be bash like ls something -> Yes, detect it and do the right things.
Detect invalid unicode better.

Most of those are now SyntaxError in the new tokenizer. I'm actually not even sure the old tokenizer used to raise SyntaxError at all, it seem it was only raise tokenize.TokenError before.

fails_in312 = [
    "a = [1,\n2,",
    "1 \n\n1 \ ",
    "x =(1+",
    "import os, \\",
    'a="""',
    "\xc3\xa9\n",
]


for f in fails_in312:
    from tokenize import generate_tokens

    list(generate_tokens(iter(f).__next__)) # ok on 3.11 raises SyntaxError on 3.12
...

(a bit hand waved as I was not yet able to get 3.12 on my local machine, but all above example should be from IPython failing CI on 3.12).

I'm assuming those changes are on purpose, due to the new parser for f-strings. I was just noting on social media that I don't know if/how I'm going to handle those breakage, and even wether it's possible to do it only on IPython side. I was still advised to open an issue anyway as it is a change of API that breaks downstream, so here is it.

Still it would be great for the tokenizer to only tokenize, and the parser the SyntaxError as there are valid reasons to tokenize incomplete/invalid python.

Your environment

CPython versions tested on: 3.12-dev from GitHub action.
Operating system and architecture: linux.

Thanks

The text was updated successfully, but these errors were encountered:

Carreau added the type-bug An unexpected behavior, bug, or error label Jun 2, 2023

pablogsal removed the type-bug An unexpected behavior, bug, or error label Jun 2, 2023

New Tokenizer API raise SyntaxError on 3.12 where it emits tokens on 3.11 #105238

New Tokenizer API raise SyntaxError on 3.12 where it emits tokens on 3.11 #105238

Comments

Carreau commented Jun 2, 2023

Your environment