Skip to content

New input transformation machinery#11041

Merged
Carreau merged 73 commits intoipython:masterfrom
takluyver:inputtransformer2
Sep 1, 2018
Merged

New input transformation machinery#11041
Carreau merged 73 commits intoipython:masterfrom
takluyver:inputtransformer2

Conversation

@takluyver
Copy link
Copy Markdown
Member

See #10272. This is a major change which I expect to be the trigger for IPython 7.0.

Main changes

  • IPython.core.inputtransformer2 is added, containing the new machinery to transform input before we parse it.
  • IPython.core.inputsplitter and IPython.core.inputtransformer are deprecated. We can't remove them immediately because nbconvert relies on them.
  • IPython.utils.tokenize2 is removed. This was a copy of the stdlib tokenize module with a couple of issues patched. One of them (bpo 12691) was fixed in Python 3.4.1, which we will require for IPython 7 (see Drop support for Python 3.3 #10833). The other (bpo 17061) turned out to be intended behaviour, so I've handled it another way in inputtransformer2.

Rationale

This will break any code that extends the current input transformation machinery (and the interface was documented), and any change this big is bound to cause some unexpected bugs. So why should we do this?

  • Maintainability: I wrote the current input transformation machinery when I'd just learned about coroutines, and I think I got carried away. I don't think anyone else wants to touch that code, and I'm wary of doing so myself. I don't think I can make the code to do this simple - it's just too complex a task - but I believe the new code is more quite a bit simpler than what it replaces.
    • Code size is not a good measure of complexity, but FWIW, 550 lines in inputtransformer2 replace 1300 lines in inputsplitter+inputtransformer.
  • Weird quirks: I hope that the new framework, which uses tokens to recognise our special syntax, will be more robust against corner cases involving comments, multiline strings, etc. ipython incorrectly suppresses backslashes at end of lines in raw multiline strings #10972 appears to be fixed. Hopefully ! after line continuation causes SyntaxError #8987 is too, though I can't currently reproduce it on master.
  • Performance: despite some improvements to the existing code in Improve performance of InputSplitter #10898, it still gets pathologically slow with long enough code, as discussed in infinite loop  #11026. That latter issue contains a sample of ~1500 lines of plain Python code (a giant dictionary literal), which takes ~20s to run through inputsplitter on master before it can be executed. The new transformation machinery in this PR processes it in about 30ms, a speedup of three orders of magnitude. 😎

Limitations

  • Worst case performance: Every piece of special syntax requiring token based transformation (%line_magics, !system commands, help?, etc.) results in another tokenisation pass after the transformation. So a large code cell with many instances of special syntax could take longer to parse than it currently does - although the current code isn't very efficient either (see above). I haven't tried to measure this. My intuition is that large cells tend to be regular Python code, as in infinite loop  #11026, in which case the new implementation can comfortably beat the old one.
  • Extensibility of 'is this complete?' checking: the old code allows transformers to 'accumulate' code to indicate that they'd like further lines of input from the user. The new system currently has no equivalent. It's a source of complexity which I'm not sure if anyone actually needs, so I propose we skip it for now and see how many people complain.

This can obscure underlying errors
This should probably have some tests.
This allows for fewer pieces to know about input_transformer_manager.
@takluyver takluyver added this to the 7.0 milestone Mar 11, 2018
@takluyver takluyver mentioned this pull request Mar 11, 2018
Copy link
Copy Markdown
Member

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takluyver This looks really good. The code looks clean. Most of my suggestions are related to docstrings and improving understanding. After merging we should probably go back and do another pass with docs. Thank you 👍

"""Input transformer machinery to support IPython special syntax.

This includes the machinery to recognise and transform ``%magic`` commands,
``!system`` commands, ``help?`` querying, prompt stripping, and so forth.
Copy link
Copy Markdown
Member

@willingc willingc Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add to comment:

Added: IPython 7.0. Replaces inputsplitter and inputtransformer which were deprecated in 7.0.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""Remove leading indentation.

If the first line starts with a spaces or tabs, the same whitespace will be
removed from each following line.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...line in the block.

Parameters
----------
prompt_re : regular expression
A regular expression matching any input prompt (including continuation)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuation `...`)

initial_re : regular expression, optional
A regular expression matching only the initial prompt, but not continuation.
If no initial expression is given, prompt_re will be used everywhere.
Used mainly for plain Python prompts, where the continuation prompt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain Python prompts (typically >>>), where the

return ['get_ipython().run_cell_magic(%r, %r, %r)\n'
% (magic_name, first_line, body)]

# -----
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray line?


@property
def input_splitter(self):
"""Make this available for compatibility
Copy link
Copy Markdown
Member

@willingc willingc Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this available for backward compatibility (pre-7.0 release) with existing code.

For example, ipykernel...

New code should use input_transformer_manager.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willingc Your 7 and 0 keys are either swapped, or you're used to software below 1.0 :-) I've edited a couple of your comments to say 7.0 instead of 0.7 as I think that's what you meant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had a backwards moment 😉

-------
result : :class:`ExecutionResult`
"""
result = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Good catch


return result

def transform_cell(self, raw_cell):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring explaining static input transformation vs dynamic transformations

@@ -0,0 +1,189 @@
import nose.tools as nt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring similar to this:

"""Tests for the token-based transformers in IPython.core.inputtransformer2

Line-based transformers are the simpler ones; token-based transformers are
more complex.
"""


kb.add('c-o', filter=(has_focus(DEFAULT_BUFFER)
& emacs_insert_mode))(newline_autoindent_outer(shell.input_splitter))
& emacs_insert_mode))(newline_autoindent_outer(shell.input_transformer_manager))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on style here. Perhaps going to a longer line length would improve readability and understanding. i.e. line 74 is more understandable than 71-72 and 66-70

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Jun 21, 2018

I'll re-read all of that later, flying tomorrow. If I have wifi in the plane, I'll do that then.

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Jun 22, 2018

I push a docstring and a couple of deprecation warnings.
Feel free to remove if you like.

@takluyver
Copy link
Copy Markdown
Member Author

Thanks @willingc for the detailed review! I think I've basically applied all of your suggestions.

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Jun 24, 2018

cc @dakoop, @colinjbrown from https://github.com/dataflownb/dfkernel I believe you are using input transformer. Just to make sure this will not affect you too much.

@dakoop
Copy link
Copy Markdown

dakoop commented Jun 25, 2018

@Carreau, thanks for the heads-up. Shouldn't affect us much, and it looks like a nice improvement.

@willingc
Copy link
Copy Markdown
Member

@takluyver Traveling today but trust that you covered all of my suggestions. I'll give a look tomorrow.

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Aug 28, 2018

@takluyver, I'd like to try to cut a 7.0 current september. We had no complaint so I'm assuming everyone is +1, so I suggest fixing conflict (relatively trivial), merging and moving on.

With that we can get a beta for about a week, then RC and final by hopefully mid to late september. How does that sound ?

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Aug 29, 2018

I've fixed the conflicts.

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Sep 1, 2018

I've fixed the conflicts.

I've fixed new conflict due to the reactivation of autoindent.

@takluyver
Copy link
Copy Markdown
Member Author

Thanks for fixing up the conflicts; I would be happy to see this land (though I'm not looking forward to finding out what the change has broken).

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Sep 1, 2018

I would be happy to see this land

So be it,

though I'm not looking forward to finding out what the change has broken

I'll help you fix stuff, worse case we revert and re-iterate.

There is only a couple of things left for 7.0 (and one test regression I'd like to backport #11277 for 5.9) and we can do a beta.

@Carreau Carreau merged commit 0fbad83 into ipython:master Sep 1, 2018
@Carreau
Copy link
Copy Markdown
Member

Carreau commented Sep 1, 2018

🎉 🍸 :pinot grigio:

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Sep 1, 2018

I'm sure even Mr Meeseeks is excited...

@lumberbot-app
Copy link
Copy Markdown
Contributor

lumberbot-app bot commented Sep 1, 2018

congratulation Thomas, good work !

@lumberbot-app
Copy link
Copy Markdown
Contributor

lumberbot-app bot commented Sep 1, 2018

party parrotparty parrotparty parrotparty parrotparty parrotparty parrotparty parrotparty parrotparty parrotparty parrot

@takluyver
Copy link
Copy Markdown
Member Author

:-P

Does saying the bots name and the word 'excited' in the same comment trigger that?

@Carreau
Copy link
Copy Markdown
Member

Carreau commented Sep 2, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants