Skip to content

Simplify 'interpolation' data, and move to an easier to consume System.Range approach for it#57966

Merged
CyrusNajmabadi merged 27 commits intodotnet:mainfrom
CyrusNajmabadi:simplifyInterpolationPArsing4
Nov 30, 2021
Merged

Simplify 'interpolation' data, and move to an easier to consume System.Range approach for it#57966
CyrusNajmabadi merged 27 commits intodotnet:mainfrom
CyrusNajmabadi:simplifyInterpolationPArsing4

Conversation

@CyrusNajmabadi
Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Nov 24, 2021

Followup to #57945. This changes the interpolation struct to be simpler and not contain information that can be computed when the interpolation is needed.

It also moves the internal data to be in terms of Ranges instead of positions for things like where the { and } are in an interpolation. This is needed for raw-interpolated-strings as those braces may be more than one character long. Usage of spans also makes length/substring computation trivial. Prior to this we had helpers that took in two positions but which was inclusive on both. This lead to a lot of complex math and didn't follow any of the slicing math patterns we use everywhere else in roslyn. Switching to Range just makes all this complexity fall out.

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner November 24, 2021 06:46
@ghost ghost added the Area-Compilers label Nov 24, 2021
@CyrusNajmabadi CyrusNajmabadi changed the title Simplify interpolation parsing Simplify 'interpolation' data, and move to an easier to consume data model for it Nov 24, 2021
internal partial class LanguageParser
{
/// <summary>
/// "Safe" substring using start and end positions rather than start and length.
Copy link
Contributor Author

@CyrusNajmabadi CyrusNajmabadi Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, 'safe' is crazy here. We should never be in a situation where we need to munge positions. Everything shoudl be known with exact locations and we should always be able to safely use the data collected. This approach to specifying inclusive end positions, but then rectifying them is just bad.

Second, using inclusive for start/end just violated all of our (and the BCL) patterns around ranges/spans. Nothing else works this way (esp. string slicing), so we really don't want to do this.

return (last > s.Length || len <= 0) ? string.Empty : s.Substring(first, len);
}
private static string Substring(string str, TextSpan span)
=> str.Substring(span.Start, span.Length);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No fuss, no muss. You ask for a slice, you get it.

// with no inserts. We must still use String.Format to get its handling of escapes such as {{,
// so we still treat it as a composite format string.
var text = Substring(originalText, openQuoteIndex + 1, closeQuoteIndex - 1);
var text = Substring(originalText, TextSpan.FromBounds(openQuoteIndex + 1, closeQuoteIndex));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for weird -1. THis is the standard inclusive/exclusive slice logic.

var text = Substring(originalText,
TextSpan.FromBounds(
i == 0 ? openQuoteIndex + 1 : interpolations[i - 1].CloseBraceSpan.End,
interpolation.OpenBraceSpan.Start));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of needing +1 and -1 on the curly locations, we can actually just use known concepts like .End and .Start to specify what we care about. (we should probably do this for the openQuote part as well, but i haven't had need to just yet).

return CheckFeatureAvailability(result, MessageID.IDS_FeatureInterpolatedStrings);
}

private static InterpolationSyntax ParseInterpolation(CSharpParseOptions options, string text, Lexer.Interpolation interpolation, bool isVerbatim)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made static so it's clear that you can't call instance methods both on the original 'parser' instance and the 'tempParser' created within.

using (var tempLexer = new Lexer(Text.SourceText.From(parsedText), this.Options, allowPreprocessorDirectives: false, interpolationFollowedByColon: interpolation.HasColon))
var openBraceToken = this.EatToken(SyntaxKind.OpenBraceToken);
var (expression, alignment) = getExpressionAndAlignment();
var (format, closeBraceToken) = getFormatAndCloseBrace();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complex interpolation parsing was broken into it's three successive segments. The parts taht return pairs of items do so as there's a relation between those two pieces (esp. wrt to how trivia is associated with either part).

#endif
return result;

(ExpressionSyntax expression, InterpolationAlignmentClauseSyntax alignment) getExpressionAndAlignment()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff is awful. i'm not sure a good way to review before/after. i recommend just trying to understand the after part.

@CyrusNajmabadi CyrusNajmabadi changed the title Simplify 'interpolation' data, and move to an easier to consume data model for it Simplify 'interpolation' data, and move to an easier to consume TextSpan approach for it Nov 24, 2021
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review November 24, 2021 19:38
@CyrusNajmabadi
Copy link
Contributor Author

@RikkiGibson @jcouv @chsienki this is ready for review.

@CyrusNajmabadi CyrusNajmabadi marked this pull request as draft November 24, 2021 19:57
@CyrusNajmabadi CyrusNajmabadi changed the title Simplify 'interpolation' data, and move to an easier to consume TextSpan approach for it Simplify 'interpolation' data, and move to an easier to consume System.Range approach for it Nov 24, 2021
// with no inserts. We must still use String.Format to get its handling of escapes such as {{,
// so we still treat it as a composite format string.
var text = Substring(originalText, openQuoteIndex + 1, closeQuoteIndex - 1);
var text = originalText[new Range(openQuoteIndex + 1, closeQuoteIndex)];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for -1, Range math is sane math.

Note: we might want to move the open-quote part to be a range as well, but i haven't needed that so far.

var text = Substring(originalText, (i == 0) ? (openQuoteIndex + 1) : (interpolations[i - 1].CloseBracePosition + 1), interpolation.OpenBracePosition - 1);
var text = originalText[new Range(
i == 0 ? openQuoteIndex + 1 : interpolations[i - 1].CloseBraceRange.End,
interpolation.OpenBraceRange.Start)];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range math is much saner. you can just use .Start and .End easily and have things just make sense.

@jcouv
Copy link
Member

jcouv commented Nov 29, 2021

                    var errorCode = this.ScanVerbatimStringLiteral(ref info);

There's a bit of an asymmetry which I didn't understand. In the @ case (here), we call ScanVerbatimStringLiteral or ScanInterpolatedStringLiteral. But in the $ case (line 785), we use TryScanInterpolatedString which always does ScanInterpolatedStringLiteral.


In reply to: 982008924


Refers to: src/Compilers/CSharp/Portable/Parser/Lexer.cs:767 in bb55d90. [](commit_id = bb55d90, deletion_comment = False)

_isVerbatim = isVerbatim;

_isVerbatim = (lexer.TextWindow.PeekChar(0) == '$' && lexer.TextWindow.PeekChar(1) == '@') ||
(lexer.TextWindow.PeekChar(0) == '@' && lexer.TextWindow.PeekChar(1) == '$');
Copy link
Member

@jcouv jcouv Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad we can't do something like is ['$', '@', ..] or ['@', '$', ..], because we probably don't want to pull on the length here, but it would look really cool ;-) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup yup.

@jcouv jcouv self-assigned this Nov 29, 2021
@CyrusNajmabadi CyrusNajmabadi requested a review from jcouv November 30, 2021 00:47
@CyrusNajmabadi
Copy link
Contributor Author

There's a bit of an asymmetry

Yup. it's asymmetric. I'm happy to try to unify those in followup as well.

@jcouv
Copy link
Member

jcouv commented Nov 30, 2021

Yup. it's asymmetric.

Is it a bug, or just some redundant code?

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 25)

using var tempLexer = new Lexer(SourceText.From(originalText), this.Options, allowPreprocessorDirectives: false);
var info = default(Lexer.TokenInfo);
tempLexer.ScanInterpolatedStringLiteralTop(interpolations, isVerbatim, ref info, out error, out closeQuoteMissing);
tempLexer.ScanInterpolatedStringLiteralTop(ref info, out error, out openQuoteRange, interpolations, out closeQuoteRange);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolations

Is this the only thing that this local function needs from the surrounding context? If so, consider making the function static and passing this in. It's far enough away from the usage that it's not immediately obvious that interpolations is used in here when reading the main method body, and since the builder is immediately freed there's some refactoring risk there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i wasn't sure the best structure here. i've changed it uip a bit now to hopefully make it a bit clearer. now the data that is in/out should be more obvious/relevant (i think).

bool closeQuoteMissing;
using (var tempLexer = new Lexer(Text.SourceText.From(originalText), this.Options, allowPreprocessorDirectives: false))

rescanInterpolation(out var openQuoteRange, out var error, out var closeQuoteRange);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openQuoteRange

Can we make any assertions that these quote ranges are within the original string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can. though we'll also horrifically crash if they're not since we slice using those ranges. so teh assertion won't be doing much more than just precrashing :)

@CyrusNajmabadi
Copy link
Contributor Author

Is it a bug, or just some redundant code?

i believe the latter.

@333fred
Copy link
Member

333fred commented Nov 30, 2021

Done review pass (commit 25). Just a couple of minor comments.

@CyrusNajmabadi
Copy link
Contributor Author

There's a bit of an asymmetry which I didn't understand.

Ok, fixed that :) was needed in the past, but not anymore with lots of the refactoring that have gone on here :)

@CyrusNajmabadi CyrusNajmabadi enabled auto-merge (squash) November 30, 2021 02:03
@CyrusNajmabadi
Copy link
Contributor Author

Ok. I'm going to try this as a squash. :)

@jcouv jcouv self-requested a review November 30, 2021 02:05
Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 17)

@CyrusNajmabadi CyrusNajmabadi merged commit 2df14d4 into dotnet:main Nov 30, 2021
@ghost ghost added this to the Next milestone Nov 30, 2021
@allisonchou allisonchou modified the milestones: Next, 17.1.P2 Nov 30, 2021
333fred added a commit to 333fred/roslyn that referenced this pull request Dec 3, 2021
…rovements

* upstream/main: (310 commits)
  Read SourceLink info and call service to retrieve source from there (dotnet#57978)
  Add new parser/lexer to the StackTraceAnalyzer (dotnet#57598) (dotnet#58050)
  Snap 17.1 P2 (dotnet#58041)
  Make it possible to analyze the dataflow of `ConstructorInitializerSyntax` and `PrimaryConstructorBaseTypeSyntax` (dotnet#57576)
  Shorten paths in VS installation (dotnet#57726)
  Add comments
  Add new parser/lexer to the StackTraceAnalyzer (dotnet#57598)
  Fix await completion for expression body lambda
  Add tests
  Fix comment
  Honor option, and also improve formatting with comment
  Skip TestLargeStringConcatenation (dotnet#58035)
  Log runtime framework of remote host
  Mark EqualityContract property accessor as not auto-implemented (dotnet#57917)
  Fix typo in XML doc for GeneratorExtensions (dotnet#58020)
  Hold Receiver directly in bound node for implicit indexer access (dotnet#58009)
  Pass AnalysisKind instead of int
  Enable nullable reference types for TableDataSource
  Simplify 'interpolation' data, and move to an easier to consume System.Range approach for it (dotnet#57966)
  Add missing test for CallerArgumentExpression (dotnet#57805)
  ...
@CyrusNajmabadi CyrusNajmabadi deleted the simplifyInterpolationPArsing4 branch February 1, 2022 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants