Reimplement parser comment attachment to avoid duplicates and generally improve it by loganfsmyth · Pull Request #9084 · babel/babel

loganfsmyth · 2018-11-26T03:56:43Z

Q	A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?	No-ish. We already had pretty bad guarantees about how comments behave, and I think this PR is a universal improvement.
Minor: New Feature?
Tests Added + Pass?	Yes
Documentation PR Link
Any Dependency Changes?
License	MIT

It's hard to say if this will affect anyone. The changes have the potential to cause comments to disappear if users use the new parser changes with the old babel-traverse. That said, we've never provided much in the way of guarantees for comments anyway. Chances are, any complaints would be resolved by ensuring users have upgraded both core/traverse and the parser. In 99% of cases I'd expect those to all be updated at the same time anyway.

This PR had one primary goal, which is to stop inserting comments into multiple locations in the AST. To do that though, I've tried to formalize a reasonable set of rules around how we attach comments in the first place, so people can have a reasonable intuition about it.

The rules come down to this:

Leading comments are any comments that
- Were not already consumed as trailing comments for a previous node
- Come fully before the node.start offset
- Will always be the shallowest node to satisfy the node.start
Inner comments are any comments that
- Show up within the node.start and node.end range of a node
- Have not been attached to any child nodes, which I think means they'll only show up on nodes that literally had no children.
Trailing comments are any comments that
- Come after the last child node of a given parent, but are before the parent's parent.end
- Come after a node but before one of ], ) or }
- If the next sibling node is on the same line
  - Come immediately after a node, with no tokens between
- If the next sibling node is not on the same line
  - Come immediately after a node, with no tokens between
  - Come immediately after a node and any trailing , or ; tokens on the same line as the node end

From what I've seen, these rules seem very satisfactory and fit well with my own intuition. I'm curious what others think.

babel-bot · 2018-11-26T04:00:25Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/9461/

babel-bot · 2018-11-26T04:00:25Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/9461/

danez · 2018-11-26T05:24:09Z

I will have a look at it soon. I know we had problems with changing locations of comments in the past and I experienced them in react-docgen so I will plug this PR into there and see what fails and if i can make the jsdoc parsing work correctly or it just happens to work. We can also try to run prettier with this change and see if their tests work.

danez · 2018-11-26T06:02:31Z

Ok I plugged it into prettier and one test failed, but the recorded snapshots looks incorrect. The changed comment behavior in this PR seems to fix it 👍:

input:

const Component = branch/*::     <Props, ExternalProps> */(
  ({ src }) => !src,
  // $FlowFixMe
  renderNothing,
)(BaseComponent);

const C = b/*:: <A> */(foo) + c/*:: <B> */(bar);

foo/*::<bar>*/(baz);

foo/*::<bar>*/();

output diff:

- Expected
+ Received

 const Component = branch/*:: <Props, ExternalProps> */(
-   /*::     <Props, ExternalProps> */ ({ src }) => !src,
+   ({ src }) => !src,
    // $FlowFixMe
    renderNothing
 )(BaseComponent);
 
- const C = b/*:: <A> */(/*:: <A> */ foo) + c/*:: <B> */(/*:: <B> */ bar);
+ const C = b/*:: <A> */(foo) + c/*:: <B> */(bar);
 
- foo/*:: <bar> */(/*::<bar>*/ baz);
+ foo/*:: <bar> */(baz);
 
 foo/*:: <bar> */();

danez · 2018-11-26T06:59:04Z

react-docgen has some problems, but only because it relies on the fact that comments are attached as trailing on the decorator and not only as leading on the id of the class or the classbody.
https://github.com/reactjs/react-docgen/blob/master/src/handlers/componentDocblockHandler.js#L39-L46

export default
@decorator
/** class description */
class A {}

I guess this can be fixed somehow, although I can't see an easy solution, as there is no location recorded in the AST where the class keyword is located.

so in the case of

export default /** Decorator description */ @decorator /** Description */ class /** Base */SomeName {}

it is possible to know where the first comment belongs to, but it is not really possible to know which of the other two comments come before and which after the class keyword and which react-docgen should treat as the description of the class. Both of the two comments are attached to the Identifier or the Classbody (if there is no id).

Any idea about this special case? It is the same case that broke in the past and whythis hack is present in react-docgen.

danez · 2018-11-26T07:35:10Z

In general the ruleset for the comment attachment make sense to me and I feel having it clearly defined is really good.

I'm worried about the potential of breaking stuff and I think this change maybe should not be in a minor version. Not necessarily because of Babel users but for tools that use Babylon standalone. (As mentioned above)
Though maybe this change might be a good candidate for a feature flag as discussed in the last meeting. It could be enabled in Babel by default if we know it is not breaking the default Babel use cases.

I haven't looked at the code changes so far. Will do it tomorrow.

nicolo-ribaudo · 2018-11-26T13:25:02Z

Leading comments are any comments that

Were not already consumed as trailing comments for a previous node

Come fully before the node.start offset

Will always be the shallowest node to satisfy the node.start

I think that we should add another rule:

Are not separated from the current node by any token

This rule would change the behavior in cases like this one:

if /* a */ (x) {}

Currently /* x */ is a leading comment of the Identifier, but with the new rule it wouldn't be assigned to x and thus would become an inner comment of the IfStatement.

It would also fix this case, where /* x */ is considered a leading comment of the BlockStatement:

for (/* x */;;) {}

Apart from that, I think that these rules are pretty good. We should write them somewhere else other than in this PR so that they don't get lost.

I'll review the code later.

@danez

export default /** Decorator description */ @decorator /** Description */ class /** Base */SomeName {}

Special-casing that use-case would unfortunately break these:

/**
 * Class description
 */
@frozen class Foo {}

@frozen // Prevent modifications to the class
class Foo {}

nicolo-ribaudo · 2018-11-26T13:40:43Z

packages/babel-parser/src/parser/comments.js

+      // comments between nodes and these tokens automatically count as
+      // trailing comments.
+      groupEnd:
+        type === tt.bracketR || type === tt.braceR || type === tt.parenR,


Should we also add >, since it is used as a groupEnd for type parameters?

danez · 2018-11-27T01:17:03Z

@nicolo-ribaudo

This rule would change the behavior in cases like this one:
if /* a */ (x) {}
Currently /* x */ is a leading comment of the Identifier, but with the new rule it wouldn't be assigned to x and thus would become an inner comment of the IfStatement.

I think this would actually also help with the example I posted if i understood it correctly:

/** leading on ExportDeclaration */
export default 
/** leading on Decorator */ 
@decorator 
/** inner on class because of class keyword between comment and identifier */
class /** leading on Identifier */SomeName 
/** leading on body */
{}

Is that correct?

loganfsmyth · 2018-11-27T02:10:29Z

@nicolo-ribaudo

Currently /* x */ is a leading comment of the Identifier, but with the new rule it wouldn't be assigned to x and thus would become an inner comment of the IfStatement.

My aim with this change was to try to stay in the bounds of what we already do wherever possible. innerComments is very limited for now and I think that's realistically for the best. Since don't have a Concrete Syntax Tree, it's not at all clear how innerComments should behave in anything but the simplest of cases. I think this if probably is one of those cases, but I'd really love to avoid addings special cases for specific node types. I'm also consider moving to more innerComments to be more likely to qualify as breaking, since babel-generator only really expects innerComments in a few places.

It would also fix this case, where /* x */ is considered a leading comment of the BlockStatement:

for (/* x */;;) {}

The issue here is innerComments still isn't good enough resolution anyway. What would babel-generator do?

{
  type: "ForStatement",
  innerComments: [{ type: "CommentBlock", value: " x " }],
}

could serialize as any of

for /* x */(;;) {}
for (/* x */;;) {}
for (;/* x */;) {}
for (;;/* x */) {}

technically. My preference for this patch is to leave the preference for comments to attach as leading comments on the next-closest node.

Should we also add >, since it is used as a groupEnd for type parameters?

See below reply to @danez.

@danez

export default
@decorator
/** class description */
class A {}

This should be a trailing comment on the decorator, but it's a case I missed. I think this and the case above about > may point at a change I should make around how I handle sibling tokens. I actually wonder if I could avoid the groupEnd token ideal altogether. Let me explore and see if I run into any issues.

I'm worried about the potential of breaking stuff and I think this change maybe should not be in a minor version. Not necessarily because of Babel users but for tools that use Babylon standalone. (As mentioned above)
Though maybe this change might be a good candidate for a feature flag as discussed in the last meeting. It could be enabled in Babel by default if we know it is not breaking the default Babel use cases.

Yeah, I'm on the fence. I think partly it depends on how close we can get to the previous behavior. In an ideal worth, this patch would

Avoid putting comments in two places
Exclusively remove duplicates in the places where people didn't expect them to be anyway

You two have pointed out a big case I missed, so thanks a ton for that. I can also do another pass to see what cases might diverge from those 2 points. The logic I added around permeable is the only place I know of off the top of my head, since previously

[
  foo, // comment
  other
]

would place // comment as a leading comment on other rather than as a trailing comment on foo, but now it would be treated as a trailing comment on foo. If that is something that you feel is too aggressive, I'd be fine with just dropping it for now.

At the end of the day, if the only way to land this is a feature flag, I'm open to it, but I'd love to avoid that if at all possible.

loganfsmyth · 2018-11-29T05:28:58Z

I'll reopen once I have an update.

loganfsmyth added 4 commits November 25, 2018 19:25

Move the getExpression implementation onto the core parser.

269f4b5

Ensure that expression parses have a parent File exit.

f1f7a27

Fix path removal to distribute comments better.

7712bdf

Reimplement comment attachment to avoid duplication of comment nodes.

3a98471

loganfsmyth requested review from danez and nicolo-ribaudo November 26, 2018 03:58

nicolo-ribaudo reviewed Nov 26, 2018

View reviewed changes

nicolo-ribaudo added PR: Bug Fix 🐛 A type of pull request used for our changelog categories pkg: parser area: comments labels Nov 26, 2018

loganfsmyth closed this Nov 29, 2018

danez mentioned this pull request Dec 12, 2018

Comment parsing is not deterministic #9165

Closed

banga mentioned this pull request Sep 2, 2019

Refactor trailing comment adjustment #10380

Merged

lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 4, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reimplement parser comment attachment to avoid duplicates and generally improve it#9084

Reimplement parser comment attachment to avoid duplicates and generally improve it#9084
loganfsmyth wants to merge 4 commits intobabel:masterfrom
loganfsmyth:comment-dedup

loganfsmyth commented Nov 26, 2018

Uh oh!

babel-bot commented Nov 26, 2018

Uh oh!

babel-bot commented Nov 26, 2018

Uh oh!

danez commented Nov 26, 2018 •

edited

Loading

Uh oh!

danez commented Nov 26, 2018 •

edited

Loading

Uh oh!

danez commented Nov 26, 2018

Uh oh!

danez commented Nov 26, 2018 •

edited

Loading

Uh oh!

nicolo-ribaudo commented Nov 26, 2018 •

edited

Loading

Uh oh!

nicolo-ribaudo Nov 26, 2018

Uh oh!

danez commented Nov 27, 2018 •

edited

Loading

Uh oh!

loganfsmyth commented Nov 27, 2018

Uh oh!

loganfsmyth commented Nov 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

loganfsmyth commented Nov 26, 2018

Uh oh!

babel-bot commented Nov 26, 2018

Uh oh!

babel-bot commented Nov 26, 2018

Uh oh!

danez commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danez commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danez commented Nov 26, 2018

Uh oh!

danez commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolo-ribaudo commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolo-ribaudo Nov 26, 2018

Choose a reason for hiding this comment

Uh oh!

danez commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loganfsmyth commented Nov 27, 2018

Uh oh!

loganfsmyth commented Nov 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danez commented Nov 26, 2018 •

edited

Loading

danez commented Nov 26, 2018 •

edited

Loading

danez commented Nov 26, 2018 •

edited

Loading

nicolo-ribaudo commented Nov 26, 2018 •

edited

Loading

danez commented Nov 27, 2018 •

edited

Loading