Skip to content

Do not persist diagnostics just to purge them from memory#53792

Merged
CyrusNajmabadi merged 10 commits intodotnet:mainfrom
CyrusNajmabadi:diagnosticsPersistence
Jun 2, 2021
Merged

Do not persist diagnostics just to purge them from memory#53792
CyrusNajmabadi merged 10 commits intodotnet:mainfrom
CyrusNajmabadi:diagnosticsPersistence

Conversation

@CyrusNajmabadi
Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi commented May 31, 2021

WIP to see if there are any issues here.

I recommend reviewing a commit at a time.

@CyrusNajmabadi CyrusNajmabadi requested a review from mavasani May 31, 2021 18:00
@ghost ghost added the Area-IDE label May 31, 2021
@CyrusNajmabadi
Copy link
Contributor Author

@mavasani I wanted to open this to continue the discussion started in: #53743

Effectively, it seems strange to me that diagnostics are using hte persistence service in teh first place (esp. with how they use it). Specifically, it does not appear as if we could ever really successfully load diagnostics from a previous session. That's because we are using versionstamps here, and versionstamps are not stable across sessions of VS. All other usages from all other features (except diagnostics) use checksums for this reason as those are stable and are keyed to the exact contents of the entity we are associating the persisted data with.

Furthermore, even if this was supported, it would be weird as we'd effectively be hoping that the persisted diagnostics were valid against the solution we loaded, which could lead to incorrect information being presented.

Because of this, and because of hte 64bit work, and because diagnostics should be OOP, i've opened this PR to remove all usages of persisting for diagnostics. Instead, the last results for any particular analyzer+key combo are just held in memory until the next time we compute the updated values for it.

This seems like this should not be problematic given the size and scope of diagnostics. In other words, it's common to literally have zero diagnostics much of the time. ANd even in bad situations where a lot of errors are produced (likely through missing references/cascading) you should only have thousands of them. And they will eventually go away once the latest are generated. So it feels like persisting this stuff (esp. as it's in OOP) and esp in a 64 bit world just seems like an unnecessarily expensive and complex thing to do for something conceptually simple.

Thoughts on your part about this?

var location = Location.Create(document.FilePath, textSpan: default, lineSpan: default);

await service.SynchronizeWithBuildAsync(
service.SynchronizeWithBuild(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of things became sync as we only needed async for persistence/loading.


// loading data can be cancelled any time.
var serializer = new DiagnosticDataSerializer(_owner.AnalyzerVersion, lastResult.Version);
var serializerVersion = lastResult.Version;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, where we used to use a Serializer (which points at a 'Version' and assocaited teh diagnostics with that version), we now just pas along the version instead.

if (lastResult.IsDefault)
{
return await LoadInitialAnalysisDataAsync(persistentService, project, cancellationToken).ConfigureAwait(false);
return await LoadInitialAnalysisDataAsync(project, cancellationToken).ConfigureAwait(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear to me if this even makes sense. what are we loading initial data from? that said, i'm not sure the lifetimes of proejct-states, so it's possible this has meaning otuside of the concept of persistence across sessions.

=> SaveCoreAsync(project, result, persistentService);

public Task SaveInMemoryCacheAsync(Project project, DiagnosticAnalysisResult result)
=> SaveCoreAsync(project, result, persistentService: null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these collapsed down to a single concept.

}

if (!await TryDeserializeDocumentDiagnosticsAsync(persistentService, serializer, document, builder, cancellationToken).ConfigureAwait(false))
if (!TryGetDiagnosticsFromInMemoryStorage(serializerVersion, document, builder))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names were updated to better reflect what it's doing.

return stateSets.ToImmutable();
}

public static bool OnDocumentReset(IEnumerable<StateSet> stateSets, TextDocument document)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these static methods moved to be local functions at the single locations where they were called.

nameof(SynchronizeWithBuildAsync),
() => state.SaveAsync(PersistentStorageService, project, result),
cancellationToken);
state.SaveToInMemoryStorage(project, result);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were both saving to in-memory cache and saving to persistence layer. now that the latter is gone, this is jsut a single call with no task queueing.

// Update the state to clear diagnostics and raise corresponding diagnostic updated events
// on a serialized task queue.
_taskQueue.ScheduleTask(nameof(ClearErrors), async () =>
_taskQueue.ScheduleTask(nameof(ClearErrors), () =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: i think we should get rid of this taskqueue. I have a feeling it is the source of lots of potential bugs around having a consistent model of all diagnostics. esp. as these are all just sync operations, i really dont' get why we dont' just apply them synchronously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the task queue was introduced to minimize the work done in these callbacks, which is likely to slow down the build. Won't all these code go away from Roslyn once we move to LSP diagnostics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taht would be the hope :) we certainly shouldn't have an ExternalErrorDiagUpdateSource in roslyn then :)

: default;
}

private void RemoveInMemoryCache(DiagnosticAnalysisResult lastResult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method still used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. effectively as we produce new results, this is used to purge the previous values we had associated with that analyzer.

This is the code in question:

            public void SaveToInMemoryStorage(Project project, DiagnosticAnalysisResult result)
            {
                Contract.ThrowIfTrue(result.IsAggregatedForm);
                Contract.ThrowIfNull(result.DocumentIds);

                RemoveInMemoryCache(_lastResult);

                // save last aggregated form of analysis result
                _lastResult = result.ToAggregatedForm();

@mavasani
Copy link
Contributor

mavasani commented Jun 1, 2021

Overall, the change looks good to me. However, @heejaechang introduced the persistent storage for diagnostics for performance reasons, so I would let him provide additional context here on the kind of performance issues that were seen to motivate this storage model for diagnostics.

@CyrusNajmabadi
Copy link
Contributor Author

Given that we are now oop and 64bit, I feel like any motivation here would be way down.

especially when there are large number of diagnostics for things such as broken references.

This is potentially interesting. But even then, I'm not sure we should default to adding lots of complexity to support that scenario, given that it should be rare and temporary.

Any thoughts @heejaechang

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review June 1, 2021 15:09
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner June 1, 2021 15:09
@CyrusNajmabadi CyrusNajmabadi requested a review from mavasani June 2, 2021 00:32
@CyrusNajmabadi
Copy link
Contributor Author

@mavasani for review. note that i'll be doing an RPS pass first to ensure everything is ok.

}

private async Task SerializeAsync(IPersistentStorageService? persistentService, DiagnosticDataSerializer serializer, Project project, TextDocument? document, object key, string stateKey, ImmutableArray<DiagnosticData> diagnostics)
private void AddToInMemoryStorage(VersionStamp serializerVersion, Project project, TextDocument? document, object key, string stateKey, ImmutableArray<DiagnosticData> diagnostics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I wouldn't have hated to see this stay asynchronous. In the event we did need to offload this data, we would have the option of implementing it on the original asynchronous paths using temporary storage service instead of persistent storage service. We can still do that, but the change would involve reverting a decent portion of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell . That is reasonable. I'll make this all async again (probably ValueTask). We can then easily tweak in the future if appropriate.

@CyrusNajmabadi
Copy link
Contributor Author

@CyrusNajmabadi CyrusNajmabadi enabled auto-merge June 2, 2021 18:48
@CyrusNajmabadi
Copy link
Contributor Author

Took sam's suggestion. so if we need to move to another persistence system, we can do so simply.

@CyrusNajmabadi CyrusNajmabadi merged commit 75d1282 into dotnet:main Jun 2, 2021
@ghost ghost added this to the Next milestone Jun 2, 2021
@CyrusNajmabadi CyrusNajmabadi deleted the diagnosticsPersistence branch June 2, 2021 20:27
@RikkiGibson RikkiGibson modified the milestones: Next, 17.0.P2 Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants