Skip to content

Avoid OOP syncing when doing a nav-to search during the initial load of a solution.#52351

Merged
CyrusNajmabadi merged 19 commits intodotnet:mainfrom
CyrusNajmabadi:navToOnLoad
Apr 3, 2021
Merged

Avoid OOP syncing when doing a nav-to search during the initial load of a solution.#52351
CyrusNajmabadi merged 19 commits intodotnet:mainfrom
CyrusNajmabadi:navToOnLoad

Conversation

@CyrusNajmabadi
Copy link
Contributor

Followup to #52315. This can be reviewed once that goes in.

This PR changes our behavior around nav-to during the period of time when the solution is loading. In the previous PR we switched to not generate SG files during solution load, but now takes things further. In this PR we also make it so we don't even sync the host with OOP before running the nav to search. Instead, we just take whatever data we have stored in the OOP cache and query it directly for matches, reporting whatever we have.

Because of this, we save on the entirety of the OOP sync cost, which on my machine (with release, ngen'ed, etc. dlls) is already around 35 seconds. Without this, the entire operation goes down to around 2 seconds to search the entire nav-to DB.

However, this performance comes with a caveat. Because we are not syncing our host data to oop, it's possible that we will find results for files that either no longer exist, or for locations in files that are incorrect. For the first case, we filter out any such results on the host side when we get the answers back from oop. For the second case, we accept that the locations may be temporarily incorrect until the solution is fully loaded, and we then go query for accurate results.

During this time, we also show this information to the user:

image

This lets them know that data may not be complete during this time, helping to prime an expectation that this is a 'best effort' result, and they may need to rerun the search later to get full results.

@CyrusNajmabadi CyrusNajmabadi requested review from a team as code owners April 1, 2021 23:19
@CyrusNajmabadi CyrusNajmabadi requested a review from a team April 1, 2021 23:19
@ghost ghost added the Area-IDE label Apr 1, 2021
=> false;

public bool TryNavigateToSpan(Workspace workspace, DocumentId documentId, TextSpan textSpan, OptionSet options, CancellationToken cancellationToken)
public bool TryNavigateToSpan(Workspace workspace, DocumentId documentId, TextSpan textSpan, OptionSet options, bool allowInvalidSpan, CancellationToken cancellationToken)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a general concept here is that navto can return old data from a previous session which may not be invalid. currently TryNavigateTo can throw if someone passes in a span that isn't in teh bounds of the current doc (despite being called 'try...'). This now allows that behavior to be controlled. By default, we still throw (so that we can catch actual logic bugs in our code). However, in the case that we know we may truly not expect to be within the bounds of the doc anymore, we dont' want to throw as that could reasonable happen.

// make sure we only process this project if we didn't already process it above.
if (processedProjects.Add(currentProject))
tasks.Add(Task.Run(() => SearchAsync(currentProject, priorityDocs.ToImmutableArray(), seenItems, isFullyLoaded), _cancellationToken));
tasks.Add(Task.Run(() => SearchAsync(currentProject, priorityDocs.Where(d => d.Project == currentProject).ToImmutableArray(), seenItems, isFullyLoaded), _cancellationToken));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we could search a project, but then say: hey... tehse docs from this other project should be prioritized. It's pretty non-sensical and led to more complex reasoning downsteram. Now, i've strictly doc'ed in teh interface that the priority docs are always from the project being passed in, and it means cleaner code and easier reasoning downstream.

{
var boundedTextSpan = GetSpanWithinDocumentBounds(textSpan, text.Length);
if (boundedTextSpan != textSpan)
if (boundedTextSpan != textSpan && !allowInvalidSpan)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're always resilient to the span being requested not being in the span of the tree. however, by default, we will log a watson for that case right below this. Now, in the case of showing stale navto entries, we do not report such a watson as being not within bounds of hte doc is not considered an exceptional circumstance.

/// classifications are only returned if they match the content the file currently has.</param>
ValueTask<SerializableClassifiedSpans?> GetCachedSemanticClassificationsAsync(
SerializableDocumentKey documentKey,
DocumentKey documentKey,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had a dual of a primitive data type and it's 'serializable' equivalent. however, the primitive data type was trivial to make serializable itself. so this dual just went away.

if (result != null)
return result;

return NoOpPersistentStorage.Instance;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no change in logic. this just makes it easier to set a breakpoint on the failure case when debugging issues.

@CyrusNajmabadi CyrusNajmabadi requested review from davidwengier, dibarbet and jasonmalinowski and removed request for a team April 2, 2021 08:46
Copy link
Member

@dibarbet dibarbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there existing tests for the non-fully loaded code path that covers all the new cached code paths?

(PatternMatchKind.LowercaseSubstring, NavigateToMatchKind.Fuzzy));

public static Task SearchProjectInCurrentProcessAsync(
public static Task SearchFullyLoadedProjectInCurrentProcessAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fully loaded in the context of the operation progress service, or fully loaded in that the solution is sync'd to the oop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully loaded in the context of the operation-progress-service. effectively: if the project system has handed everything over to us, then we do the full search. otherwise, if they're still loading, we do the cached search.


public Task SearchProjectAsync(Project project, ImmutableArray<Document> priorityDocuments, string searchPattern, IImmutableSet<string> kinds, Func<INavigateToSearchResult, Task> onResultFound, bool isFullyLoaded, CancellationToken cancellationToken)
{
return isFullyLoaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like when the solution is not fully loaded (from operation progress), we now search the cached results instead of the actual documents which avoids waiting for the oop. But all projects may not be loaded yet so the document set being searched in the cached results may not be exhaustive.

This probably isn't relevant to this PR specifically, but is it possible to cache the projects for a solution? So that even if the projects aren't fully loaded yet, the cached search can find them based on the solution being searched?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all projects may not be loaded yet so the document set being searched in the cached results may not be exhaustive.

Correct. And the UI informs the user of that.

This probably isn't relevant to this PR specifically, but is it possible to cache the projects for a solution? So that even if the projects aren't fully loaded yet, the cached search can find them based on the solution being searched?

It is probably something we could do in teh future :) it's def an interesting idea. my general concern about those sorts of things is ensuring they get properly invalidated/updated as the solution changes over time.

private static Task<SyntaxTreeIndex?> LoadAsync(Document document, Checksum checksum, CancellationToken cancellationToken)
=> LoadAsync(document.Project.Solution.Workspace, DocumentKey.ToDocumentKey(document), checksum, GetStringTable(document.Project), cancellationToken);

public static async Task<SyntaxTreeIndex?> LoadAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this take in the storage service instead of the workspace? iirc Tomas was wanting to remove workspace references from the OOP, so moving the workspace references up to the top might make it easier to remove later on down the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that will take a lot of work no matter what. so i'd be ok just waiting until that stage.


private static SyntaxTreeIndex? ReadFrom(
StringTable stringTable, ObjectReader reader, Checksum checksum)
StringTable stringTable, ObjectReader reader, Checksum? checksum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking through the code to see if the usages of checksum allowed nullable, and it appears like it should. But the constructor for SyntaxTreeIndex is not nullable enabled
http://sourceroslyn.io/#Microsoft.CodeAnalysis.Workspaces/FindSymbols/SyntaxTree/SyntaxTreeIndex.cs,28

But the property that checksum gets assigned to is in a nullable enabled file and is specifically not null
http://sourceroslyn.io/#Microsoft.CodeAnalysis.Workspaces/FindSymbols/SyntaxTree/SyntaxTreeIndex_Persistence.cs,c8398183df844544

Might be good to update the annotations there to explicitly allow null (and nullable enable the ctor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will do.

@CyrusNajmabadi CyrusNajmabadi enabled auto-merge April 2, 2021 20:54
@CyrusNajmabadi CyrusNajmabadi merged commit ebcd714 into dotnet:main Apr 3, 2021
@ghost ghost added this to the Next milestone Apr 3, 2021
@CyrusNajmabadi CyrusNajmabadi deleted the navToOnLoad branch April 11, 2021 18:20
@dibarbet dibarbet modified the milestones: Next, 16.10.P3 Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants