Skip to content

Reimplement parser comment attachment to avoid duplicates and generally improve it#9084

Closed
loganfsmyth wants to merge 4 commits intobabel:masterfrom
loganfsmyth:comment-dedup
Closed

Reimplement parser comment attachment to avoid duplicates and generally improve it#9084
loganfsmyth wants to merge 4 commits intobabel:masterfrom
loganfsmyth:comment-dedup

Conversation

@loganfsmyth
Copy link
Copy Markdown
Member

Q                       A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change? No-ish. We already had pretty bad guarantees about how comments behave, and I think this PR is a universal improvement.
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

It's hard to say if this will affect anyone. The changes have the potential to cause comments to disappear if users use the new parser changes with the old babel-traverse. That said, we've never provided much in the way of guarantees for comments anyway. Chances are, any complaints would be resolved by ensuring users have upgraded both core/traverse and the parser. In 99% of cases I'd expect those to all be updated at the same time anyway.

This PR had one primary goal, which is to stop inserting comments into multiple locations in the AST. To do that though, I've tried to formalize a reasonable set of rules around how we attach comments in the first place, so people can have a reasonable intuition about it.

The rules come down to this:

  1. Leading comments are any comments that
    • Were not already consumed as trailing comments for a previous node
    • Come fully before the node.start offset
    • Will always be the shallowest node to satisfy the node.start
  2. Inner comments are any comments that
    • Show up within the node.start and node.end range of a node
    • Have not been attached to any child nodes, which I think means they'll only show up on nodes that literally had no children.
  3. Trailing comments are any comments that
    • Come after the last child node of a given parent, but are before the parent's parent.end
    • Come after a node but before one of ], ) or }
    • If the next sibling node is on the same line
      • Come immediately after a node, with no tokens between
    • If the next sibling node is not on the same line
      • Come immediately after a node, with no tokens between
      • Come immediately after a node and any trailing , or ; tokens on the same line as the node end

From what I've seen, these rules seem very satisfactory and fit well with my own intuition. I'm curious what others think.

@babel-bot
Copy link
Copy Markdown
Collaborator

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/9461/

1 similar comment
@babel-bot
Copy link
Copy Markdown
Collaborator

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/9461/

@danez
Copy link
Copy Markdown
Member

danez commented Nov 26, 2018

I will have a look at it soon. I know we had problems with changing locations of comments in the past and I experienced them in react-docgen so I will plug this PR into there and see what fails and if i can make the jsdoc parsing work correctly or it just happens to work. We can also try to run prettier with this change and see if their tests work.

@danez
Copy link
Copy Markdown
Member

danez commented Nov 26, 2018

Ok I plugged it into prettier and one test failed, but the recorded snapshots looks incorrect. The changed comment behavior in this PR seems to fix it 👍:

input:

const Component = branch/*::     <Props, ExternalProps> */(
  ({ src }) => !src,
  // $FlowFixMe
  renderNothing,
)(BaseComponent);

const C = b/*:: <A> */(foo) + c/*:: <B> */(bar);

foo/*::<bar>*/(baz);

foo/*::<bar>*/();

output diff:

- Expected
+ Received

 const Component = branch/*:: <Props, ExternalProps> */(
-   /*::     <Props, ExternalProps> */ ({ src }) => !src,
+   ({ src }) => !src,
    // $FlowFixMe
    renderNothing
 )(BaseComponent);
 
- const C = b/*:: <A> */(/*:: <A> */ foo) + c/*:: <B> */(/*:: <B> */ bar);
+ const C = b/*:: <A> */(foo) + c/*:: <B> */(bar);
 
- foo/*:: <bar> */(/*::<bar>*/ baz);
+ foo/*:: <bar> */(baz);
 
 foo/*:: <bar> */();

@danez
Copy link
Copy Markdown
Member

danez commented Nov 26, 2018

react-docgen has some problems, but only because it relies on the fact that comments are attached as trailing on the decorator and not only as leading on the id of the class or the classbody.
https://github.com/reactjs/react-docgen/blob/master/src/handlers/componentDocblockHandler.js#L39-L46

export default
@decorator
/** class description */
class A {}

I guess this can be fixed somehow, although I can't see an easy solution, as there is no location recorded in the AST where the class keyword is located.

so in the case of

export default /** Decorator description */ @decorator /** Description */ class /** Base */SomeName {}

it is possible to know where the first comment belongs to, but it is not really possible to know which of the other two comments come before and which after the class keyword and which react-docgen should treat as the description of the class. Both of the two comments are attached to the Identifier or the Classbody (if there is no id).

Any idea about this special case? It is the same case that broke in the past and whythis hack is present in react-docgen.

@danez
Copy link
Copy Markdown
Member

danez commented Nov 26, 2018

In general the ruleset for the comment attachment make sense to me and I feel having it clearly defined is really good.

I'm worried about the potential of breaking stuff and I think this change maybe should not be in a minor version. Not necessarily because of Babel users but for tools that use Babylon standalone. (As mentioned above)
Though maybe this change might be a good candidate for a feature flag as discussed in the last meeting. It could be enabled in Babel by default if we know it is not breaking the default Babel use cases.

I haven't looked at the code changes so far. Will do it tomorrow.

@nicolo-ribaudo
Copy link
Copy Markdown
Member

nicolo-ribaudo commented Nov 26, 2018

  1. Leading comments are any comments that
    • Were not already consumed as trailing comments for a previous node
    • Come fully before the node.start offset
    • Will always be the shallowest node to satisfy the node.start

I think that we should add another rule:

  • Are not separated from the current node by any token

This rule would change the behavior in cases like this one:

if /* a */ (x) {}

Currently /* x */ is a leading comment of the Identifier, but with the new rule it wouldn't be assigned to x and thus would become an inner comment of the IfStatement.

It would also fix this case, where /* x */ is considered a leading comment of the BlockStatement:

for (/* x */;;) {}

Apart from that, I think that these rules are pretty good. We should write them somewhere else other than in this PR so that they don't get lost.

I'll review the code later.


@danez

export default /** Decorator description */ @decorator /** Description */ class /** Base */SomeName {}

Special-casing that use-case would unfortunately break these:

/**
 * Class description
 */
@frozen class Foo {}

@frozen // Prevent modifications to the class
class Foo {}

// comments between nodes and these tokens automatically count as
// trailing comments.
groupEnd:
type === tt.bracketR || type === tt.braceR || type === tt.parenR,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add >, since it is used as a groupEnd for type parameters?

@nicolo-ribaudo nicolo-ribaudo added PR: Bug Fix 🐛 A type of pull request used for our changelog categories pkg: parser area: comments labels Nov 26, 2018
@danez
Copy link
Copy Markdown
Member

danez commented Nov 27, 2018

@nicolo-ribaudo

This rule would change the behavior in cases like this one:

if /* a */ (x) {}

Currently /* x */ is a leading comment of the Identifier, but with the new rule it wouldn't be assigned to x and thus would become an inner comment of the IfStatement.

I think this would actually also help with the example I posted if i understood it correctly:

/** leading on ExportDeclaration */
export default 
/** leading on Decorator */ 
@decorator 
/** inner on class because of class keyword between comment and identifier */
class /** leading on Identifier */SomeName 
/** leading on body */
{}

Is that correct?

@loganfsmyth
Copy link
Copy Markdown
Member Author

@nicolo-ribaudo

Currently /* x */ is a leading comment of the Identifier, but with the new rule it wouldn't be assigned to x and thus would become an inner comment of the IfStatement.

My aim with this change was to try to stay in the bounds of what we already do wherever possible. innerComments is very limited for now and I think that's realistically for the best. Since don't have a Concrete Syntax Tree, it's not at all clear how innerComments should behave in anything but the simplest of cases. I think this if probably is one of those cases, but I'd really love to avoid addings special cases for specific node types. I'm also consider moving to more innerComments to be more likely to qualify as breaking, since babel-generator only really expects innerComments in a few places.

It would also fix this case, where /* x */ is considered a leading comment of the BlockStatement:

for (/* x */;;) {}

The issue here is innerComments still isn't good enough resolution anyway. What would babel-generator do?

{
  type: "ForStatement",
  innerComments: [{ type: "CommentBlock", value: " x " }],
}

could serialize as any of

for /* x */(;;) {}
for (/* x */;;) {}
for (;/* x */;) {}
for (;;/* x */) {}

technically. My preference for this patch is to leave the preference for comments to attach as leading comments on the next-closest node.

Should we also add >, since it is used as a groupEnd for type parameters?

See below reply to @danez.

@danez

export default
@decorator
/** class description */
class A {}

This should be a trailing comment on the decorator, but it's a case I missed. I think this and the case above about > may point at a change I should make around how I handle sibling tokens. I actually wonder if I could avoid the groupEnd token ideal altogether. Let me explore and see if I run into any issues.

I'm worried about the potential of breaking stuff and I think this change maybe should not be in a minor version. Not necessarily because of Babel users but for tools that use Babylon standalone. (As mentioned above)
Though maybe this change might be a good candidate for a feature flag as discussed in the last meeting. It could be enabled in Babel by default if we know it is not breaking the default Babel use cases.

Yeah, I'm on the fence. I think partly it depends on how close we can get to the previous behavior. In an ideal worth, this patch would

  1. Avoid putting comments in two places
  2. Exclusively remove duplicates in the places where people didn't expect them to be anyway

You two have pointed out a big case I missed, so thanks a ton for that. I can also do another pass to see what cases might diverge from those 2 points. The logic I added around permeable is the only place I know of off the top of my head, since previously

[
  foo, // comment
  other
]

would place // comment as a leading comment on other rather than as a trailing comment on foo, but now it would be treated as a trailing comment on foo. If that is something that you feel is too aggressive, I'd be fine with just dropping it for now.

At the end of the day, if the only way to land this is a feature flag, I'm open to it, but I'd love to avoid that if at all possible.

@loganfsmyth
Copy link
Copy Markdown
Member Author

I'll reopen once I have an update.

@lock lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area: comments outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Bug Fix 🐛 A type of pull request used for our changelog categories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants