Always emit non-logical newlines for 'empty' lines by charliermarsh · Pull Request #27 · RustPython/Parser

charliermarsh · 2023-05-15T03:08:11Z

Summary

Right now, if you have a comment like:

# foo

The lexer emits a comment, but no newline. It turns out that if the lexer encounters an "empty" line, we skip the newline emission, and a comment counts as an "empty" line (see: eat_indentation, where we eat indentation and comments).

This PR modifies the lexer to emit a NonLogicalNewline in such cases. As a result, we'll now always have either a newline or non-logical newline token at the end of a line (excepting continuations). I believe this is more consistent with CPython. For example, given this snippet:

# Some comment

def foo():
    return 99

CPython outputs:

TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=60 (COMMENT), string='# Some comment', start=(1, 0), end=(1, 14), line='# Some comment\n')
TokenInfo(type=61 (NL), string='\n', start=(1, 14), end=(1, 15), line='# Some comment\n')
TokenInfo(type=61 (NL), string='\n', start=(2, 0), end=(2, 1), line='\n')
TokenInfo(type=1 (NAME), string='def', start=(3, 0), end=(3, 3), line='def foo():\n')
TokenInfo(type=1 (NAME), string='foo', start=(3, 4), end=(3, 7), line='def foo():\n')
TokenInfo(type=54 (OP), string='(', start=(3, 7), end=(3, 8), line='def foo():\n')
TokenInfo(type=54 (OP), string=')', start=(3, 8), end=(3, 9), line='def foo():\n')
TokenInfo(type=54 (OP), string=':', start=(3, 9), end=(3, 10), line='def foo():\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 10), end=(3, 11), line='def foo():\n')
TokenInfo(type=5 (INDENT), string='    ', start=(4, 0), end=(4, 4), line='    return 99\n')
TokenInfo(type=1 (NAME), string='return', start=(4, 4), end=(4, 10), line='    return 99\n')
TokenInfo(type=2 (NUMBER), string='99', start=(4, 11), end=(4, 13), line='    return 99\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(4, 13), end=(4, 14), line='    return 99\n')
TokenInfo(type=61 (NL), string='\n', start=(5, 0), end=(5, 1), line='\n')
TokenInfo(type=6 (DEDENT), string='', start=(6, 0), end=(6, 0), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(6, 0), end=(6, 0), line='')

Note the NL tokens after the comment, and for the empty line, along with the NL token at the end prior to the dedent.

charliermarsh · 2023-05-15T03:08:36Z

\cc @MichaReiser

MichaReiser · 2023-05-15T06:08:19Z

parser/src/lexer.rs

                }
                Some('\n' | '\r') => {
                    // Empty line!
+                    let tok_start = self.get_pos();


Unrelated to your changes. I think this emits two newlines when using \r\n instead of one.

I think this emits two newlines when using \r\n instead of one

yeah, that should be the case, could probably special case this by looking at the next char and just advancing but can't remember if that caused issues when I was tweaking this a while ago.

I think it might actually work correctly? I think next_char already does the advancement:

// Helper function to go to the next character coming up. fn next_char(&mut self) -> Option<char> { let mut c = self.window[0]; self.window.slide(); match c { Some('\r') => { if self.window[0] == Some('\n') { self.location += TextSize::from(1); self.window.slide(); } self.location += TextSize::from(1); c = Some('\n'); } #[allow(unused_variables)] Some(c) => { self.location += c.text_len(); } _ => {} } c }

ah, I had forgotten about that 😄 Test would also probably fail if this was the this case.

youknowone

The change looks reasonable.
Due to lack of tests of this repository,
You may want to make a ruff port before merging this PR.

Please feel free to merge it when you ready to go.

youknowone · 2023-05-15T14:34:11Z

Ruff PR changed: astral-sh/ruff#4438

charliermarsh requested a review from youknowone May 15, 2023 03:08

charliermarsh mentioned this pull request May 15, 2023

Fix expected-indentation errors with end-of-line comments astral-sh/ruff#4418

Merged

MichaReiser approved these changes May 15, 2023

View reviewed changes

youknowone approved these changes May 15, 2023

View reviewed changes

charliermarsh force-pushed the charlie/newline branch 2 times, most recently from 4a738f0 to e1f408e Compare May 15, 2023 15:41

Always emit non-logical newlines for 'empty' lines

66ccbc8

charliermarsh force-pushed the charlie/newline branch from e1f408e to 66ccbc8 Compare May 15, 2023 15:43

charliermarsh mentioned this pull request May 15, 2023

Emit non-logical newlines for "empty" lines astral-sh/ruff#4444

Merged

charliermarsh merged commit 10dda12 into RustPython:main May 15, 2023

youknowone mentioned this pull request May 16, 2023

Fix full-lexer feature #38

Merged

charliermarsh mentioned this pull request Mar 12, 2024

Fix Indexer fails to identify continuation preceded by newline #10351 astral-sh/ruff#10354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always emit non-logical newlines for 'empty' lines#27

Always emit non-logical newlines for 'empty' lines#27
charliermarsh merged 1 commit intoRustPython:mainfrom
astral-sh:charlie/newline

charliermarsh commented May 15, 2023

Uh oh!

charliermarsh commented May 15, 2023

Uh oh!

MichaReiser May 15, 2023

Uh oh!

DimitrisJim May 15, 2023

Uh oh!

charliermarsh May 15, 2023

Uh oh!

DimitrisJim May 15, 2023 •

edited

Loading

Uh oh!

youknowone left a comment

Uh oh!

youknowone commented May 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

charliermarsh commented May 15, 2023

Summary

Uh oh!

charliermarsh commented May 15, 2023

Uh oh!

MichaReiser May 15, 2023

Choose a reason for hiding this comment

Uh oh!

DimitrisJim May 15, 2023

Choose a reason for hiding this comment

Uh oh!

charliermarsh May 15, 2023

Choose a reason for hiding this comment

Uh oh!

DimitrisJim May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

youknowone commented May 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DimitrisJim May 15, 2023 •

edited

Loading