Support reporting more accurate 'changed spans' when doing syntactic classification#52472
Conversation
… to make intent clear.
| /// incrementally identical, all children of each node will be incrementally identical as well. | ||
| /// </summary> | ||
| public bool IsIncrementallyIdenticalTo(SyntaxNode other) | ||
| => this.Green == other.Green; |
There was a problem hiding this comment.
paired methods in compiler layer with the existing IsEquivalentTo.
| } | ||
|
|
||
| // Couldn't compute a narrower range. Just the mark the entire file as changed. | ||
| return currentSnapshot.GetFullSpan(); |
There was a problem hiding this comment.
we fallback to old behavior if we can't get a narrower region. this will also be the behavior for F#. (TS doesn't matter as they don't use our classification system, they use TextMate).
| // We want to compute a minimal change, but we don't want this to run for too long. So do the | ||
| // computation work in the threadpool, but also gate how much time we can spend here so that we can let | ||
| // the editor know about the size of the change asap. | ||
| using var linkedToken = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); |
There was a problem hiding this comment.
i wasn't comfortable with the idea of us just spending arbitrary time computing the diff. though it may not be an issue in practice as i imagine our diffing algorithm (later in this pr) should be blisteringly fast.
| linkedToken.Cancel(); | ||
|
|
||
| // ensure that if we completed because of cancellation, we throw that up. | ||
| cancellationToken.ThrowIfCancellationRequested(); |
There was a problem hiding this comment.
not sure about this canclelation handshake. i always find linked tokens a bit clunky.
| } | ||
|
|
||
| public async Task<TextChangeRange?> ComputeSyntacticChangeRangeAsync(Document oldDocument, Document newDocument, CancellationToken cancellationToken) | ||
| => await SyntacticChangeRangeComputer.ComputeSyntacticChangeRangeAsync(oldDocument, newDocument, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
async/await needed here as the helper returns the non-nullable version.
| Microsoft.CodeAnalysis.SyntaxContextReceiverCreator | ||
| Microsoft.CodeAnalysis.SyntaxNode.IsIncrementallyIdenticalTo(Microsoft.CodeAnalysis.SyntaxNode! other) -> bool | ||
| Microsoft.CodeAnalysis.SyntaxNodeOrToken.IsIncrementallyIdenticalTo(Microsoft.CodeAnalysis.SyntaxNodeOrToken other) -> bool | ||
| Microsoft.CodeAnalysis.SyntaxToken.IsIncrementallyIdenticalTo(Microsoft.CodeAnalysis.SyntaxToken token) -> bool |
There was a problem hiding this comment.
@dotnet/roslyn-compiler these are new APIs i'm adding to effectively ask "do these nodes have identical green nodes". It's a way to dtermine extremely quickly what parts of a tree were definitely reused across an incremental edit..
This is used in this PR to take two trees, and conservatively (not minimally) determine the portion of the tree that changed. For any green nodes that are identical between the before/after we know that that didn't change. So we only need to examine nodes that did change. See the logic in SyntacticChangeRangeComputer below for how we utilize this.
There was a problem hiding this comment.
If these were added to the public API, I would prefer them be added as static methods in a helper class which are not extension methods so they aren't showing up for users working with these types in common scenarios.
There was a problem hiding this comment.
It's a fair point. But I'm not sure a good static class these should go to. There is also precedence here as these three types already have IsEquivalentTo. So this really is a sibling method to those. But I would be fine moving these of people have a good suggestion on where it should go
There was a problem hiding this comment.
Honestly, this seems fine to me as a location for the API. As Cyrus mentioned, we already have IsEquivalentTo APIs, and those aren't common-case APIs either.
|
@CyrusNajmabadi any chance we can add a benchmark for this? |
Definitely! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
jcouv
left a comment
There was a problem hiding this comment.
Compiler changes LGTM Thanks (iteration 36) with a nit (missing annotation)
| } | ||
|
|
||
| // Couldn't compute a narrower range. Just the mark the entire file as changed. | ||
| return currentSnapshot.GetFullSpan(); |
There was a problem hiding this comment.
For future people, what is the advice on using GetChangedSpanAsync over currentSnapshot.GetFullSpan in this class?
There was a problem hiding this comment.
Effectively (and i will doc this) no one should use GetFullSpan except for thsi code.
…cClassificationTaggerProvider.TagComputer.cs Co-authored-by: Andrew Hall <ryzngard@live.com>
…i/roslyn into classificationDiff
| { | ||
| // If they're the same doc, there is no change. | ||
| if (oldDocument == newDocument) | ||
| return new TextChangeRange(); |
There was a problem hiding this comment.
| return new TextChangeRange(); | |
| return new(); |
Tiny nit
There was a problem hiding this comment.
my preference is to use new() in places that are immediatley apparent. So with X x = new() or X Foo() => new(). Outside of that, i'm currently using sparingly :)
| { | ||
| private static readonly ObjectPool<Stack<SyntaxNodeOrToken>> s_pool = new(() => new()); | ||
|
|
||
| public static async ValueTask<TextChangeRange?> ComputeSyntacticChangeRangeAsync( |
There was a problem hiding this comment.
It's unclear why we allow returning null here. It looks like cases where we timeout waiting on the syntax roots. We still had to compute them unless the cancellation token cancelled, can we use that to return a non-null value here?
There was a problem hiding this comment.
Not exactly. We may not have canceled, but we may have otherwise exceeded out time budget. "Cancellation" in these codepaths is for saying: no, stop everything, all results are useless as the client has moved on.
The diffTimeout is different. It's: we still need the results, but we need the fast, so don't do superflous work.
Does that make sense?
There was a problem hiding this comment.
Yea, I meant that exceeding the time budge can return null if the root wasn't retrieved in time. It seems odd we have null in that case, sense we're saying "Assume both of these are completely different", based on the comment. An actual result with the whole tree being different would be easier to understand and consume than guessing what null meant. Is it no difference? Is it all different?
There was a problem hiding this comment.
An actual result with the whole tree being different would be easier to understand and consume than guessing what null meant. Is it no difference? Is it all different?
I see what you mean. To do this, we'd still need the sizes of both the old and new tree. And that means we'd have to then get both roots. I found it easier to just return null trivially and have the caller know that means that the narrow range could not be computed :)
| /// </para> | ||
| /// </summary> | ||
| ValueTask<TextChangeRange?> ComputeSyntacticChangeRangeAsync( | ||
| Document oldDocument, Document newDocument, TimeSpan timeout, CancellationToken cancellationToken); |
There was a problem hiding this comment.
Do we have any telemetry for how long this is taking in practice? I support the idea of the timeout, but if in practice it turns out entirely unnecessary I'm wondering if it's just better to delete.
There was a problem hiding this comment.
The concern i have is that this is a case where i:
- expect the normal case to be fast
- do not want the uncommon case to negatively impact the user experience
This is hard to really get telemetry on. We'd need to collect info and then know that we truly were never really running into the slow case.
| // doc we grab, just that we grab some prior version. This is only used to narrow down the changed range we | ||
| // specify, so it's ok if it's slightly larger because we read in a change from a couple of edits ago. | ||
| var previousDocument = _lastProcessedDocument; |
There was a problem hiding this comment.
I admit I'm not quite understanding this comment. Is there any concern that an edit which is undone might somehow cause problems here? Put another way, if we go from version A to B, and B to C (where C is basically A again), could the compiler's caching and reuse of tokens mean A and C could be the same even though we reported tags for B?
There was a problem hiding this comment.
yes. this has a bad race condition. not sure how i convinced myself this was safe. will have followup pr.
| var oldRoot = await oldDocument.GetRequiredSyntaxRootAsync(cancellationToken).ConfigureAwait(false); | ||
|
|
||
| // If we ran out of time, we have to assume both are completely different. | ||
| if (stopwatch.Elapsed > timeout) |
There was a problem hiding this comment.
It seems a bit odd to me that we're including the fetching of roots in the timeout when we already did that at the caching layer higher. I get the desire that the language service returns a generic "cached object" but it's a bit strange to have timeout code here running which we expect to be instantaneous. Also a bit strange that the "this is designed to be fast" algorithm is an async method in the first place, since the only way it's ever fast is if the caller already knows the roots are available.
| using var leftOldStack = s_pool.GetPooledObject(); | ||
| using var leftNewStack = s_pool.GetPooledObject(); | ||
| using var rightOldStack = s_pool.GetPooledObject(); | ||
| using var rightNewStack = s_pool.GetPooledObject(); | ||
|
|
||
| leftOldStack.Object.Push(oldRoot); | ||
| leftNewStack.Object.Push(newRoot); | ||
| rightOldStack.Object.Push(oldRoot); | ||
| rightNewStack.Object.Push(newRoot); |
There was a problem hiding this comment.
The stacks are only used in the implementations of the local methods? Just move them in? That way we're not allocating ones which may not be used (the right stacks), and it probably also means the local functions can become static?
| // Similarly, if we've run out of time, just return what we've computed so far. It's not as accurate as | ||
| // we could be. But the caller wants the results asap. | ||
| if (stopwatch.Elapsed > timeout) | ||
| return currentOld.FullSpan.Start; |
There was a problem hiding this comment.
I love that we're able to give a "good enough" answer like this.
Needs a benchmark.