Clarify guidelines in Parser.md by CyrusNajmabadi · Pull Request #56987 · dotnet/roslyn

CyrusNajmabadi · 2021-10-06T20:19:09Z

No description provided.

@jcouv

Draft for @jcouv review.

CyrusNajmabadi · 2021-10-08T18:29:28Z

cston · 2021-10-08T22:04:14Z

docs/compilers/Design/Parser.md

+There may be reasons to violate this rule. For example, the syntax model does not have a concept of precedence, but the parser uses precedence to decide how to assemble the tree. Precedence errors are therefore reported in the parser.  As the syntactic shape of the tree is impacted here and we do not want to form illegal tree, diagnostics at this layer are sensible.
+
+However, diagnostics should be avoided for cases where the syntactic model can be easily fitted to, but the rule is effectively a semantic one driven by external context.  A good example of this are 'language version' checks.  Outside of exceptional cases (like 'record parsing') parsing is not actually affected by language-version.  Instead, the lang version simply states if the construct can be used or not, not if it was recognized and parsed successfully into the syntax model.  These checks should happen later, like in the decl-table or binding passed of the compiler (see the DOs/DON'Ts section of this document below for reasons why).  Note: an acceptable reason to still keep these checks at the parser level would be to avoid a simple and direct parsing check turning into a 'smeared out peanut butter' check higher up (e.g. where perhaps dozens of locations might need checks added).  In that case, the negative cost to compiler maintainability would outweigh the positive benefits we get elsewhere.



It feels like the text changes in this PR can be simplified significantly. Please consider replacing the changes with the following. More than this seems unnecessary.

PREFER reporting diagnostics in binding when diagnostics do not affect the shape of the tree

The incremental parser does not reuse syntax nodes with diagnostics so it is preferable to report diagnostics in binding rather than parsing when diagnostics do not affect the shape of the syntax tree and when the cost of reporting diagnostics in binding or parsing is similar.
The common case is a syntax change introduced in a newer language version where the syntax tree shape is independent of language version.

cston · 2021-10-08T22:04:31Z

docs/compilers/Design/Parser.md


 There may be reasons to violate this rule, for example where the language is specified to be sensitive to context. For example, `await` is treated as a keyword if the enclosing method has the `async` modifier. 

+#### Impact of not following these DOs/DON'Ts


This section seems unnecessary.

I think this was necessary as the impact here was not well understood when conversing on this topic with other team members.

jcouv · 2022-02-15T21:44:57Z

Consider whether this PR is still necessary now that the relevant code was labelled with // PREFER reporting diagnostics in binding when diagnostics do not affect the shape of the syntax tree

CyrusNajmabadi · 2022-02-16T23:06:15Z

I thnik it is still pretty relevant. Esp for someone coming into the codebase. Explaining the 'whys' is important IMO to get a good grasp of the reason we care about this.

RikkiGibson · 2022-02-17T00:00:20Z

docs/compilers/Design/Parser.md

+
+1. The presence of ambient context greatly impacts the ability to do incremental parsing properly.  Ambient context must be tracked in some fashion and incremental reuse across disparate contexts leads to violating the invariant that incremental reparsing produces the same tree as normal parsing.  This can and lead to fundamental brokenness that is only solved by a host restart.  As an example, consider an edit that adds 'async' to a method.  This must necessarily affect reuse of nodes within the method as they may be parsed separately.  Every piece of ambient context can have this effect and it dramatically makes it harder to reason about incremental edits and can lead to subtle and hard to diagnose or repro incremental errors in the wild.
+
+2. The presense of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser.  Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file.  As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it.  This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created.  Both of these problems directly impact downstream consumption in IDE scenarios.  In the IDE typeing latency is a high value SLA target we need to meet.  However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree.  This includes, but is not limited to featuers like brace-insertion as well as indentation.  Both of these must happen near-instantiously to the user.  And both need an up-to-date SyntaxTree to determine what to do properly.  As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available.  Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.


Suggested change

2. The presense of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser. Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file. As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it. This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created. Both of these problems directly impact downstream consumption in IDE scenarios. In the IDE typeing latency is a high value SLA target we need to meet. However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree. This includes, but is not limited to featuers like brace-insertion as well as indentation. Both of these must happen near-instantiously to the user. And both need an up-to-date SyntaxTree to determine what to do properly. As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available. Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.

2. The presence of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser. Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file. As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it. This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created. Both of these problems directly impact downstream consumption in IDE scenarios. In the IDE typing latency is a high value SLA target we need to meet. However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree. This includes, but is not limited to features like brace-insertion as well as indentation. Both of these must happen near-instantaneously to the user. And both need an up-to-date SyntaxTree to determine what to do properly. As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available. Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.

Spelling fixes :)

RikkiGibson · 2022-02-17T00:04:11Z

IMO we should just merge as long as we don't find any of the documentation to be wrong/misleading/distracting. An imperfect document that includes useful info is better than the absence of it.

CyrusNajmabadi · 2022-03-17T17:22:41Z

/azp run

azure-pipelines · 2022-03-17T17:22:53Z

Azure Pipelines successfully started running 1 pipeline(s).

Update Parser.md

c8ce128

Draft for @jcouv review.

ghost added the Area-Compilers label Oct 6, 2021

CyrusNajmabadi requested a review from jcouv October 6, 2021 20:19

CyrusNajmabadi added 3 commits October 6, 2021 13:24

Update Parser.md

62688ed

Update Parser.md

eb68e8d

Update Parser.md

354f3a7

CyrusNajmabadi marked this pull request as draft October 8, 2021 20:45

CyrusNajmabadi marked this pull request as ready for review October 8, 2021 20:46

cston reviewed Oct 8, 2021

View reviewed changes

RikkiGibson reviewed Feb 17, 2022

View reviewed changes

RikkiGibson approved these changes Feb 17, 2022

View reviewed changes

jcouv approved these changes Mar 17, 2022

View reviewed changes

jcouv changed the title ~~Update Parser.md~~ Clarify guidelines in Parser.md Mar 17, 2022

jcouv enabled auto-merge (squash) March 17, 2022 16:44

jcouv merged commit 02d1ce5 into main Mar 17, 2022

ghost added this to the Next milestone Mar 17, 2022

CyrusNajmabadi deleted the CyrusNajmabadi-patch-1 branch March 17, 2022 17:25

allisonchou modified the milestones: Next, 17.2.P3 Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify guidelines in Parser.md#56987

Clarify guidelines in Parser.md#56987
jcouv merged 4 commits intomainfrom
CyrusNajmabadi-patch-1

CyrusNajmabadi commented Oct 6, 2021 •

edited

Loading

Uh oh!

CyrusNajmabadi commented Oct 8, 2021

Uh oh!

cston Oct 8, 2021 •

edited

Loading

Uh oh!

cston Oct 8, 2021

Uh oh!

CyrusNajmabadi Oct 8, 2021

Uh oh!

jcouv commented Feb 15, 2022

Uh oh!

CyrusNajmabadi commented Feb 16, 2022

Uh oh!

RikkiGibson Feb 17, 2022 •

edited

Loading

Uh oh!

RikkiGibson commented Feb 17, 2022

Uh oh!

CyrusNajmabadi commented Mar 17, 2022

Uh oh!

azure-pipelines bot commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		There may be reasons to violate this rule. For example, the syntax model does not have a concept of precedence, but the parser uses precedence to decide how to assemble the tree. Precedence errors are therefore reported in the parser. As the syntactic shape of the tree is impacted here and we do not want to form illegal tree, diagnostics at this layer are sensible.

		However, diagnostics should be avoided for cases where the syntactic model can be easily fitted to, but the rule is effectively a semantic one driven by external context. A good example of this are 'language version' checks. Outside of exceptional cases (like 'record parsing') parsing is not actually affected by language-version. Instead, the lang version simply states if the construct can be used or not, not if it was recognized and parsed successfully into the syntax model. These checks should happen later, like in the decl-table or binding passed of the compiler (see the DOs/DON'Ts section of this document below for reasons why). Note: an acceptable reason to still keep these checks at the parser level would be to avoid a simple and direct parsing check turning into a 'smeared out peanut butter' check higher up (e.g. where perhaps dozens of locations might need checks added). In that case, the negative cost to compiler maintainability would outweigh the positive benefits we get elsewhere.


		There may be reasons to violate this rule, for example where the language is specified to be sensitive to context. For example, `await` is treated as a keyword if the enclosing method has the `async` modifier.

		#### Impact of not following these DOs/DON'Ts


		1. The presence of ambient context greatly impacts the ability to do incremental parsing properly. Ambient context must be tracked in some fashion and incremental reuse across disparate contexts leads to violating the invariant that incremental reparsing produces the same tree as normal parsing. This can and lead to fundamental brokenness that is only solved by a host restart. As an example, consider an edit that adds 'async' to a method. This must necessarily affect reuse of nodes within the method as they may be parsed separately. Every piece of ambient context can have this effect and it dramatically makes it harder to reason about incremental edits and can lead to subtle and hard to diagnose or repro incremental errors in the wild.

		2. The presense of diagnostics beyond just `"missing token" and "unexpected token"` has impact again to the incremental parser. Incremental parsing cannot reuse nodes that have errors in them, as it does not know what the root cause of the error was and if it would be fixed by an edit in a disparate part of the file. As such, any additional diagnostic forces complete reparsing of that construct and all parent nodes above it. This adds excess time to incremental parsing and can cause higher memory churn as well as new nodes must be created. Both of these problems directly impact downstream consumption in IDE scenarios. In the IDE typeing latency is a high value SLA target we need to meet. However, many features must operate within the typing window in a manner that is 'correct' wrt to the syntax tree. This includes, but is not limited to featuers like brace-insertion as well as indentation. Both of these must happen near-instantiously to the user. And both need an up-to-date SyntaxTree to determine what to do properly. As such, these trees are retrieved synchronously using incremental parsing, so as little extra work as possible must be done so we can fit all the remainder of the work into the time-slice available. Additional diagnostics interferes with this, adding extra unnecessary CPU and causing memory churn.

Conversation

CyrusNajmabadi commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyrusNajmabadi commented Oct 8, 2021

Uh oh!

cston Oct 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

PREFER reporting diagnostics in binding when diagnostics do not affect the shape of the tree

Uh oh!

cston Oct 8, 2021

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi Oct 8, 2021

Choose a reason for hiding this comment

Uh oh!

jcouv commented Feb 15, 2022

Uh oh!

CyrusNajmabadi commented Feb 16, 2022

Uh oh!

RikkiGibson Feb 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RikkiGibson commented Feb 17, 2022

Uh oh!

CyrusNajmabadi commented Mar 17, 2022

Uh oh!

azure-pipelines bot commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CyrusNajmabadi commented Oct 6, 2021 •

edited

Loading

cston Oct 8, 2021 •

edited

Loading

RikkiGibson Feb 17, 2022 •

edited

Loading