Skip to content

Clarify guidelines in Parser.md#56987

Merged
jcouv merged 4 commits intomainfrom
CyrusNajmabadi-patch-1
Mar 17, 2022
Merged

Clarify guidelines in Parser.md#56987
jcouv merged 4 commits intomainfrom
CyrusNajmabadi-patch-1

Conversation

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Oct 6, 2021

No description provided.

Draft for @jcouv review.
@ghost ghost added the Area-Compilers label Oct 6, 2021
@CyrusNajmabadi CyrusNajmabadi requested a review from jcouv October 6, 2021 20:19
@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

pping @jcouv

@CyrusNajmabadi CyrusNajmabadi marked this pull request as draft October 8, 2021 20:45
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review October 8, 2021 20:46
There may be reasons to violate this rule. For example, the syntax model does not have a concept of precedence, but the parser uses precedence to decide how to assemble the tree. Precedence errors are therefore reported in the parser. As the syntactic shape of the tree is impacted here and we do not want to form illegal tree, diagnostics at this layer are sensible.

However, diagnostics should be avoided for cases where the syntactic model can be easily fitted to, but the rule is effectively a semantic one driven by external context. A good example of this are 'language version' checks. Outside of exceptional cases (like 'record parsing') parsing is not actually affected by language-version. Instead, the lang version simply states if the construct can be used or not, not if it was recognized and parsed successfully into the syntax model. These checks should happen later, like in the decl-table or binding passed of the compiler (see the DOs/DON'Ts section of this document below for reasons why). Note: an acceptable reason to still keep these checks at the parser level would be to avoid a simple and direct parsing check turning into a 'smeared out peanut butter' check higher up (e.g. where perhaps dozens of locations might need checks added). In that case, the negative cost to compiler maintainability would outweigh the positive benefits we get elsewhere.

Copy link
Copy Markdown
Contributor

@cston cston Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the text changes in this PR can be simplified significantly. Please consider replacing the changes with the following. More than this seems unnecessary.

PREFER reporting diagnostics in binding when diagnostics do not affect the shape of the tree

The incremental parser does not reuse syntax nodes with diagnostics so it is preferable to report diagnostics in binding rather than parsing when diagnostics do not affect the shape of the syntax tree and when the cost of reporting diagnostics in binding or parsing is similar.
The common case is a syntax change introduced in a newer language version where the syntax tree shape is independent of language version.


There may be reasons to violate this rule, for example where the language is specified to be sensitive to context. For example, `await` is treated as a keyword if the enclosing method has the `async` modifier.

#### Impact of not following these DOs/DON'Ts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was necessary as the impact here was not well understood when conversing on this topic with other team members.

@jcouv
Copy link
Copy Markdown
Member

jcouv commented Feb 15, 2022

Consider whether this PR is still necessary now that the relevant code was labelled with // PREFER reporting diagnostics in binding when diagnostics do not affect the shape of the syntax tree

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

I thnik it is still pretty relevant. Esp for someone coming into the codebase. Explaining the 'whys' is important IMO to get a good grasp of the reason we care about this.


1. The presence of ambient context greatly impacts the ability to do incremental parsing properly. Ambient context must be tracked in some fashion and incremental reuse across disparate contexts leads to violating the invariant that incremental reparsing produces the same tree as normal parsing. This can and lead to fundamental brokenness that is only solved by a host restart. As an example, consider an edit that adds 'async' to a method. This must necessarily affect reuse of nodes within the method as they may be parsed separately. Every piece of ambient context can have this effect and it dramatically makes it harder to reason about incremental edits and can lead to subtle and hard to diagnose or repro incremental errors in the wild.

2. The presense of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser. Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file. As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it. This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created. Both of these problems directly impact downstream consumption in IDE scenarios. In the IDE typeing latency is a high value SLA target we need to meet. However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree. This includes, but is not limited to featuers like brace-insertion as well as indentation. Both of these must happen near-instantiously to the user. And both need an up-to-date SyntaxTree to determine what to do properly. As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available. Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.
Copy link
Copy Markdown
Member

@RikkiGibson RikkiGibson Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. The presense of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser. Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file. As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it. This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created. Both of these problems directly impact downstream consumption in IDE scenarios. In the IDE typeing latency is a high value SLA target we need to meet. However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree. This includes, but is not limited to featuers like brace-insertion as well as indentation. Both of these must happen near-instantiously to the user. And both need an up-to-date SyntaxTree to determine what to do properly. As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available. Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.
2. The presence of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser. Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file. As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it. This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created. Both of these problems directly impact downstream consumption in IDE scenarios. In the IDE typing latency is a high value SLA target we need to meet. However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree. This includes, but is not limited to features like brace-insertion as well as indentation. Both of these must happen near-instantaneously to the user. And both need an up-to-date SyntaxTree to determine what to do properly. As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available. Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.

Spelling fixes :)

@RikkiGibson
Copy link
Copy Markdown
Member

IMO we should just merge as long as we don't find any of the documentation to be wrong/misleading/distracting. An imperfect document that includes useful info is better than the absence of it.

@jcouv jcouv changed the title Update Parser.md Clarify guidelines in Parser.md Mar 17, 2022
@jcouv jcouv enabled auto-merge (squash) March 17, 2022 16:44
@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

/azp run

@jcouv jcouv merged commit 02d1ce5 into main Mar 17, 2022
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@ghost ghost added this to the Next milestone Mar 17, 2022
@CyrusNajmabadi CyrusNajmabadi deleted the CyrusNajmabadi-patch-1 branch March 17, 2022 17:25
@allisonchou allisonchou modified the milestones: Next, 17.2.P3 Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants