Avoid OOP syncing when doing a nav-to search during the initial load of a solution.#52351
Avoid OOP syncing when doing a nav-to search during the initial load of a solution.#52351CyrusNajmabadi merged 19 commits intodotnet:mainfrom
Conversation
| => false; | ||
|
|
||
| public bool TryNavigateToSpan(Workspace workspace, DocumentId documentId, TextSpan textSpan, OptionSet options, CancellationToken cancellationToken) | ||
| public bool TryNavigateToSpan(Workspace workspace, DocumentId documentId, TextSpan textSpan, OptionSet options, bool allowInvalidSpan, CancellationToken cancellationToken) |
There was a problem hiding this comment.
a general concept here is that navto can return old data from a previous session which may not be invalid. currently TryNavigateTo can throw if someone passes in a span that isn't in teh bounds of the current doc (despite being called 'try...'). This now allows that behavior to be controlled. By default, we still throw (so that we can catch actual logic bugs in our code). However, in the case that we know we may truly not expect to be within the bounds of the doc anymore, we dont' want to throw as that could reasonable happen.
| // make sure we only process this project if we didn't already process it above. | ||
| if (processedProjects.Add(currentProject)) | ||
| tasks.Add(Task.Run(() => SearchAsync(currentProject, priorityDocs.ToImmutableArray(), seenItems, isFullyLoaded), _cancellationToken)); | ||
| tasks.Add(Task.Run(() => SearchAsync(currentProject, priorityDocs.Where(d => d.Project == currentProject).ToImmutableArray(), seenItems, isFullyLoaded), _cancellationToken)); |
There was a problem hiding this comment.
previously we could search a project, but then say: hey... tehse docs from this other project should be prioritized. It's pretty non-sensical and led to more complex reasoning downsteram. Now, i've strictly doc'ed in teh interface that the priority docs are always from the project being passed in, and it means cleaner code and easier reasoning downstream.
| { | ||
| var boundedTextSpan = GetSpanWithinDocumentBounds(textSpan, text.Length); | ||
| if (boundedTextSpan != textSpan) | ||
| if (boundedTextSpan != textSpan && !allowInvalidSpan) |
There was a problem hiding this comment.
we're always resilient to the span being requested not being in the span of the tree. however, by default, we will log a watson for that case right below this. Now, in the case of showing stale navto entries, we do not report such a watson as being not within bounds of hte doc is not considered an exceptional circumstance.
| /// classifications are only returned if they match the content the file currently has.</param> | ||
| ValueTask<SerializableClassifiedSpans?> GetCachedSemanticClassificationsAsync( | ||
| SerializableDocumentKey documentKey, | ||
| DocumentKey documentKey, |
There was a problem hiding this comment.
we had a dual of a primitive data type and it's 'serializable' equivalent. however, the primitive data type was trivial to make serializable itself. so this dual just went away.
| if (result != null) | ||
| return result; | ||
|
|
||
| return NoOpPersistentStorage.Instance; |
There was a problem hiding this comment.
no change in logic. this just makes it easier to set a breakpoint on the failure case when debugging issues.
dibarbet
left a comment
There was a problem hiding this comment.
are there existing tests for the non-fully loaded code path that covers all the new cached code paths?
| (PatternMatchKind.LowercaseSubstring, NavigateToMatchKind.Fuzzy)); | ||
|
|
||
| public static Task SearchProjectInCurrentProcessAsync( | ||
| public static Task SearchFullyLoadedProjectInCurrentProcessAsync( |
There was a problem hiding this comment.
is this fully loaded in the context of the operation progress service, or fully loaded in that the solution is sync'd to the oop?
There was a problem hiding this comment.
fully loaded in the context of the operation-progress-service. effectively: if the project system has handed everything over to us, then we do the full search. otherwise, if they're still loading, we do the cached search.
src/Features/Core/Portable/NavigateTo/AbstractNavigateToSearchService.cs
Outdated
Show resolved
Hide resolved
src/Features/Core/Portable/NavigateTo/INavigateToSearchService.cs
Outdated
Show resolved
Hide resolved
|
|
||
| public Task SearchProjectAsync(Project project, ImmutableArray<Document> priorityDocuments, string searchPattern, IImmutableSet<string> kinds, Func<INavigateToSearchResult, Task> onResultFound, bool isFullyLoaded, CancellationToken cancellationToken) | ||
| { | ||
| return isFullyLoaded |
There was a problem hiding this comment.
So it seems like when the solution is not fully loaded (from operation progress), we now search the cached results instead of the actual documents which avoids waiting for the oop. But all projects may not be loaded yet so the document set being searched in the cached results may not be exhaustive.
This probably isn't relevant to this PR specifically, but is it possible to cache the projects for a solution? So that even if the projects aren't fully loaded yet, the cached search can find them based on the solution being searched?
There was a problem hiding this comment.
But all projects may not be loaded yet so the document set being searched in the cached results may not be exhaustive.
Correct. And the UI informs the user of that.
This probably isn't relevant to this PR specifically, but is it possible to cache the projects for a solution? So that even if the projects aren't fully loaded yet, the cached search can find them based on the solution being searched?
It is probably something we could do in teh future :) it's def an interesting idea. my general concern about those sorts of things is ensuring they get properly invalidated/updated as the solution changes over time.
src/Workspaces/Core/Portable/Workspace/Host/PersistentStorage/DocumentKey.cs
Outdated
Show resolved
Hide resolved
src/Workspaces/Remote/ServiceHub/Services/NavigateToSearch/RemoteNavigateToSearchService.cs
Outdated
Show resolved
Hide resolved
| private static Task<SyntaxTreeIndex?> LoadAsync(Document document, Checksum checksum, CancellationToken cancellationToken) | ||
| => LoadAsync(document.Project.Solution.Workspace, DocumentKey.ToDocumentKey(document), checksum, GetStringTable(document.Project), cancellationToken); | ||
|
|
||
| public static async Task<SyntaxTreeIndex?> LoadAsync( |
There was a problem hiding this comment.
should this take in the storage service instead of the workspace? iirc Tomas was wanting to remove workspace references from the OOP, so moving the workspace references up to the top might make it easier to remove later on down the line.
There was a problem hiding this comment.
i think that will take a lot of work no matter what. so i'd be ok just waiting until that stage.
|
|
||
| private static SyntaxTreeIndex? ReadFrom( | ||
| StringTable stringTable, ObjectReader reader, Checksum checksum) | ||
| StringTable stringTable, ObjectReader reader, Checksum? checksum) |
There was a problem hiding this comment.
I was looking through the code to see if the usages of checksum allowed nullable, and it appears like it should. But the constructor for SyntaxTreeIndex is not nullable enabled
http://sourceroslyn.io/#Microsoft.CodeAnalysis.Workspaces/FindSymbols/SyntaxTree/SyntaxTreeIndex.cs,28
But the property that checksum gets assigned to is in a nullable enabled file and is specifically not null
http://sourceroslyn.io/#Microsoft.CodeAnalysis.Workspaces/FindSymbols/SyntaxTree/SyntaxTreeIndex_Persistence.cs,c8398183df844544
Might be good to update the annotations there to explicitly allow null (and nullable enable the ctor)
There was a problem hiding this comment.
sure. will do.
Followup to #52315. This can be reviewed once that goes in.
This PR changes our behavior around nav-to during the period of time when the solution is loading. In the previous PR we switched to not generate SG files during solution load, but now takes things further. In this PR we also make it so we don't even sync the host with OOP before running the nav to search. Instead, we just take whatever data we have stored in the OOP cache and query it directly for matches, reporting whatever we have.
Because of this, we save on the entirety of the OOP sync cost, which on my machine (with release, ngen'ed, etc. dlls) is already around 35 seconds. Without this, the entire operation goes down to around 2 seconds to search the entire nav-to DB.
However, this performance comes with a caveat. Because we are not syncing our host data to oop, it's possible that we will find results for files that either no longer exist, or for locations in files that are incorrect. For the first case, we filter out any such results on the host side when we get the answers back from oop. For the second case, we accept that the locations may be temporarily incorrect until the solution is fully loaded, and we then go query for accurate results.
During this time, we also show this information to the user:
This lets them know that data may not be complete during this time, helping to prime an expectation that this is a 'best effort' result, and they may need to rerun the search later to get full results.