Skip to content

Support reporting more accurate 'changed spans' when doing syntactic classification#52472

Merged
CyrusNajmabadi merged 46 commits intodotnet:mainfrom
CyrusNajmabadi:classificationDiff
Apr 14, 2021
Merged

Support reporting more accurate 'changed spans' when doing syntactic classification#52472
CyrusNajmabadi merged 46 commits intodotnet:mainfrom
CyrusNajmabadi:classificationDiff

Conversation

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Apr 7, 2021

Needs a benchmark.

@ghost ghost added the Area-IDE label Apr 7, 2021
/// incrementally identical, all children of each node will be incrementally identical as well.
/// </summary>
public bool IsIncrementallyIdenticalTo(SyntaxNode other)
=> this.Green == other.Green;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paired methods in compiler layer with the existing IsEquivalentTo.

}

// Couldn't compute a narrower range. Just the mark the entire file as changed.
return currentSnapshot.GetFullSpan();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fallback to old behavior if we can't get a narrower region. this will also be the behavior for F#. (TS doesn't matter as they don't use our classification system, they use TextMate).

// We want to compute a minimal change, but we don't want this to run for too long. So do the
// computation work in the threadpool, but also gate how much time we can spend here so that we can let
// the editor know about the size of the change asap.
using var linkedToken = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wasn't comfortable with the idea of us just spending arbitrary time computing the diff. though it may not be an issue in practice as i imagine our diffing algorithm (later in this pr) should be blisteringly fast.

linkedToken.Cancel();

// ensure that if we completed because of cancellation, we throw that up.
cancellationToken.ThrowIfCancellationRequested();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this canclelation handshake. i always find linked tokens a bit clunky.

}

public async Task<TextChangeRange?> ComputeSyntacticChangeRangeAsync(Document oldDocument, Document newDocument, CancellationToken cancellationToken)
=> await SyntacticChangeRangeComputer.ComputeSyntacticChangeRangeAsync(oldDocument, newDocument, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async/await needed here as the helper returns the non-nullable version.

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review April 7, 2021 22:33
@CyrusNajmabadi CyrusNajmabadi requested review from a team as code owners April 7, 2021 22:33
Microsoft.CodeAnalysis.SyntaxContextReceiverCreator
Microsoft.CodeAnalysis.SyntaxNode.IsIncrementallyIdenticalTo(Microsoft.CodeAnalysis.SyntaxNode! other) -> bool
Microsoft.CodeAnalysis.SyntaxNodeOrToken.IsIncrementallyIdenticalTo(Microsoft.CodeAnalysis.SyntaxNodeOrToken other) -> bool
Microsoft.CodeAnalysis.SyntaxToken.IsIncrementallyIdenticalTo(Microsoft.CodeAnalysis.SyntaxToken token) -> bool
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dotnet/roslyn-compiler these are new APIs i'm adding to effectively ask "do these nodes have identical green nodes". It's a way to dtermine extremely quickly what parts of a tree were definitely reused across an incremental edit..

This is used in this PR to take two trees, and conservatively (not minimally) determine the portion of the tree that changed. For any green nodes that are identical between the before/after we know that that didn't change. So we only need to examine nodes that did change. See the logic in SyntacticChangeRangeComputer below for how we utilize this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these were added to the public API, I would prefer them be added as static methods in a helper class which are not extension methods so they aren't showing up for users working with these types in common scenarios.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair point. But I'm not sure a good static class these should go to. There is also precedence here as these three types already have IsEquivalentTo. So this really is a sibling method to those. But I would be fine moving these of people have a good suggestion on where it should go

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, this seems fine to me as a location for the API. As Cyrus mentioned, we already have IsEquivalentTo APIs, and those aren't common-case APIs either.

@sharwell
Copy link
Copy Markdown
Contributor

sharwell commented Apr 8, 2021

@CyrusNajmabadi any chance we can add a benchmark for this?

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

any chance we can add a benchmark for this?

Definitely!

@sharwell sharwell marked this pull request as draft April 8, 2021 15:08
@sharwell

This comment has been minimized.

@CyrusNajmabadi

This comment has been minimized.

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review April 8, 2021 15:34
@sharwell

This comment has been minimized.

@CyrusNajmabadi

This comment has been minimized.

Comment thread src/Compilers/Core/Portable/Syntax/SyntaxNode.cs Outdated
Copy link
Copy Markdown
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler changes LGTM Thanks (iteration 36) with a nit (missing annotation)

}

// Couldn't compute a narrower range. Just the mark the entire file as changed.
return currentSnapshot.GetFullSpan();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future people, what is the advice on using GetChangedSpanAsync over currentSnapshot.GetFullSpan in this class?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively (and i will doc this) no one should use GetFullSpan except for thsi code.

{
// If they're the same doc, there is no change.
if (oldDocument == newDocument)
return new TextChangeRange();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new TextChangeRange();
return new();

Tiny nit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my preference is to use new() in places that are immediatley apparent. So with X x = new() or X Foo() => new(). Outside of that, i'm currently using sparingly :)

{
private static readonly ObjectPool<Stack<SyntaxNodeOrToken>> s_pool = new(() => new());

public static async ValueTask<TextChangeRange?> ComputeSyntacticChangeRangeAsync(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear why we allow returning null here. It looks like cases where we timeout waiting on the syntax roots. We still had to compute them unless the cancellation token cancelled, can we use that to return a non-null value here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. We may not have canceled, but we may have otherwise exceeded out time budget. "Cancellation" in these codepaths is for saying: no, stop everything, all results are useless as the client has moved on.

The diffTimeout is different. It's: we still need the results, but we need the fast, so don't do superflous work.

Does that make sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I meant that exceeding the time budge can return null if the root wasn't retrieved in time. It seems odd we have null in that case, sense we're saying "Assume both of these are completely different", based on the comment. An actual result with the whole tree being different would be easier to understand and consume than guessing what null meant. Is it no difference? Is it all different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An actual result with the whole tree being different would be easier to understand and consume than guessing what null meant. Is it no difference? Is it all different?

I see what you mean. To do this, we'd still need the sizes of both the old and new tree. And that means we'd have to then get both roots. I found it easier to just return null trivially and have the caller know that means that the narrow range could not be computed :)

@CyrusNajmabadi CyrusNajmabadi requested a review from ryzngard April 14, 2021 02:58
Copy link
Copy Markdown
Contributor

@ryzngard ryzngard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE parts LGTM

@CyrusNajmabadi CyrusNajmabadi merged commit 8ce6cf4 into dotnet:main Apr 14, 2021
@ghost ghost added this to the Next milestone Apr 14, 2021
@CyrusNajmabadi CyrusNajmabadi deleted the classificationDiff branch April 14, 2021 05:27
Copy link
Copy Markdown
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

/// </para>
/// </summary>
ValueTask<TextChangeRange?> ComputeSyntacticChangeRangeAsync(
Document oldDocument, Document newDocument, TimeSpan timeout, CancellationToken cancellationToken);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any telemetry for how long this is taking in practice? I support the idea of the timeout, but if in practice it turns out entirely unnecessary I'm wondering if it's just better to delete.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern i have is that this is a case where i:

  1. expect the normal case to be fast
  2. do not want the uncommon case to negatively impact the user experience

This is hard to really get telemetry on. We'd need to collect info and then know that we truly were never really running into the slow case.

Comment thread src/Workspaces/Core/Portable/Workspace/Solution/Document.cs
Comment on lines +253 to +255
// doc we grab, just that we grab some prior version. This is only used to narrow down the changed range we
// specify, so it's ok if it's slightly larger because we read in a change from a couple of edits ago.
var previousDocument = _lastProcessedDocument;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I'm not quite understanding this comment. Is there any concern that an edit which is undone might somehow cause problems here? Put another way, if we go from version A to B, and B to C (where C is basically A again), could the compiler's caching and reuse of tokens mean A and C could be the same even though we reported tags for B?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this has a bad race condition. not sure how i convinced myself this was safe. will have followup pr.

var oldRoot = await oldDocument.GetRequiredSyntaxRootAsync(cancellationToken).ConfigureAwait(false);

// If we ran out of time, we have to assume both are completely different.
if (stopwatch.Elapsed > timeout)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to me that we're including the fetching of roots in the timeout when we already did that at the caching layer higher. I get the desire that the language service returns a generic "cached object" but it's a bit strange to have timeout code here running which we expect to be instantaneous. Also a bit strange that the "this is designed to be fast" algorithm is an async method in the first place, since the only way it's ever fast is if the caller already knows the roots are available.

Comment on lines +73 to +81
using var leftOldStack = s_pool.GetPooledObject();
using var leftNewStack = s_pool.GetPooledObject();
using var rightOldStack = s_pool.GetPooledObject();
using var rightNewStack = s_pool.GetPooledObject();

leftOldStack.Object.Push(oldRoot);
leftNewStack.Object.Push(newRoot);
rightOldStack.Object.Push(oldRoot);
rightNewStack.Object.Push(newRoot);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stacks are only used in the implementations of the local methods? Just move them in? That way we're not allocating ones which may not be used (the right stacks), and it probably also means the local functions can become static?

Comment on lines +165 to +168
// Similarly, if we've run out of time, just return what we've computed so far. It's not as accurate as
// we could be. But the caller wants the results asap.
if (stopwatch.Elapsed > timeout)
return currentOld.FullSpan.Start;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that we're able to give a "good enough" answer like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants