Do not persist diagnostics just to purge them from memory#53792
Do not persist diagnostics just to purge them from memory#53792CyrusNajmabadi merged 10 commits intodotnet:mainfrom
Conversation
|
@mavasani I wanted to open this to continue the discussion started in: #53743 Effectively, it seems strange to me that diagnostics are using hte persistence service in teh first place (esp. with how they use it). Specifically, it does not appear as if we could ever really successfully load diagnostics from a previous session. That's because we are using versionstamps here, and versionstamps are not stable across sessions of VS. All other usages from all other features (except diagnostics) use checksums for this reason as those are stable and are keyed to the exact contents of the entity we are associating the persisted data with. Furthermore, even if this was supported, it would be weird as we'd effectively be hoping that the persisted diagnostics were valid against the solution we loaded, which could lead to incorrect information being presented. Because of this, and because of hte 64bit work, and because diagnostics should be OOP, i've opened this PR to remove all usages of persisting for diagnostics. Instead, the last results for any particular analyzer+key combo are just held in memory until the next time we compute the updated values for it. This seems like this should not be problematic given the size and scope of diagnostics. In other words, it's common to literally have zero diagnostics much of the time. ANd even in bad situations where a lot of errors are produced (likely through missing references/cascading) you should only have thousands of them. And they will eventually go away once the latest are generated. So it feels like persisting this stuff (esp. as it's in OOP) and esp in a 64 bit world just seems like an unnecessarily expensive and complex thing to do for something conceptually simple. Thoughts on your part about this? |
| var location = Location.Create(document.FilePath, textSpan: default, lineSpan: default); | ||
|
|
||
| await service.SynchronizeWithBuildAsync( | ||
| service.SynchronizeWithBuild( |
There was a problem hiding this comment.
a lot of things became sync as we only needed async for persistence/loading.
|
|
||
| // loading data can be cancelled any time. | ||
| var serializer = new DiagnosticDataSerializer(_owner.AnalyzerVersion, lastResult.Version); | ||
| var serializerVersion = lastResult.Version; |
There was a problem hiding this comment.
in general, where we used to use a Serializer (which points at a 'Version' and assocaited teh diagnostics with that version), we now just pas along the version instead.
| if (lastResult.IsDefault) | ||
| { | ||
| return await LoadInitialAnalysisDataAsync(persistentService, project, cancellationToken).ConfigureAwait(false); | ||
| return await LoadInitialAnalysisDataAsync(project, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
it's unclear to me if this even makes sense. what are we loading initial data from? that said, i'm not sure the lifetimes of proejct-states, so it's possible this has meaning otuside of the concept of persistence across sessions.
| => SaveCoreAsync(project, result, persistentService); | ||
|
|
||
| public Task SaveInMemoryCacheAsync(Project project, DiagnosticAnalysisResult result) | ||
| => SaveCoreAsync(project, result, persistentService: null); |
There was a problem hiding this comment.
these collapsed down to a single concept.
| } | ||
|
|
||
| if (!await TryDeserializeDocumentDiagnosticsAsync(persistentService, serializer, document, builder, cancellationToken).ConfigureAwait(false)) | ||
| if (!TryGetDiagnosticsFromInMemoryStorage(serializerVersion, document, builder)) |
There was a problem hiding this comment.
names were updated to better reflect what it's doing.
| return stateSets.ToImmutable(); | ||
| } | ||
|
|
||
| public static bool OnDocumentReset(IEnumerable<StateSet> stateSets, TextDocument document) |
There was a problem hiding this comment.
these static methods moved to be local functions at the single locations where they were called.
| nameof(SynchronizeWithBuildAsync), | ||
| () => state.SaveAsync(PersistentStorageService, project, result), | ||
| cancellationToken); | ||
| state.SaveToInMemoryStorage(project, result); |
There was a problem hiding this comment.
we were both saving to in-memory cache and saving to persistence layer. now that the latter is gone, this is jsut a single call with no task queueing.
| // Update the state to clear diagnostics and raise corresponding diagnostic updated events | ||
| // on a serialized task queue. | ||
| _taskQueue.ScheduleTask(nameof(ClearErrors), async () => | ||
| _taskQueue.ScheduleTask(nameof(ClearErrors), () => |
There was a problem hiding this comment.
aside: i think we should get rid of this taskqueue. I have a feeling it is the source of lots of potential bugs around having a consistent model of all diagnostics. esp. as these are all just sync operations, i really dont' get why we dont' just apply them synchronously.
There was a problem hiding this comment.
I believe the task queue was introduced to minimize the work done in these callbacks, which is likely to slow down the build. Won't all these code go away from Roslyn once we move to LSP diagnostics?
There was a problem hiding this comment.
taht would be the hope :) we certainly shouldn't have an ExternalErrorDiagUpdateSource in roslyn then :)
src/Features/Core/Portable/Diagnostics/EngineV2/DiagnosticIncrementalAnalyzer.ProjectState.cs
Show resolved
Hide resolved
| : default; | ||
| } | ||
|
|
||
| private void RemoveInMemoryCache(DiagnosticAnalysisResult lastResult) |
There was a problem hiding this comment.
Is this method still used?
There was a problem hiding this comment.
yes. effectively as we produce new results, this is used to purge the previous values we had associated with that analyzer.
This is the code in question:
public void SaveToInMemoryStorage(Project project, DiagnosticAnalysisResult result)
{
Contract.ThrowIfTrue(result.IsAggregatedForm);
Contract.ThrowIfNull(result.DocumentIds);
RemoveInMemoryCache(_lastResult);
// save last aggregated form of analysis result
_lastResult = result.ToAggregatedForm();|
Overall, the change looks good to me. However, @heejaechang introduced the persistent storage for diagnostics for performance reasons, so I would let him provide additional context here on the kind of performance issues that were seen to motivate this storage model for diagnostics. |
|
Given that we are now oop and 64bit, I feel like any motivation here would be way down.
This is potentially interesting. But even then, I'm not sure we should default to adding lots of complexity to support that scenario, given that it should be rare and temporary. Any thoughts @heejaechang |
|
@mavasani for review. note that i'll be doing an RPS pass first to ensure everything is ok. |
| } | ||
|
|
||
| private async Task SerializeAsync(IPersistentStorageService? persistentService, DiagnosticDataSerializer serializer, Project project, TextDocument? document, object key, string stateKey, ImmutableArray<DiagnosticData> diagnostics) | ||
| private void AddToInMemoryStorage(VersionStamp serializerVersion, Project project, TextDocument? document, object key, string stateKey, ImmutableArray<DiagnosticData> diagnostics) |
There was a problem hiding this comment.
💭 I wouldn't have hated to see this stay asynchronous. In the event we did need to offload this data, we would have the option of implementing it on the original asynchronous paths using temporary storage service instead of persistent storage service. We can still do that, but the change would involve reverting a decent portion of this PR.
There was a problem hiding this comment.
@sharwell . That is reasonable. I'll make this all async again (probably ValueTask). We can then easily tweak in the future if appropriate.
|
RPS test passed: https://devdiv.visualstudio.com/DevDiv/_git/VS/pullrequest/328173 |
|
Took sam's suggestion. so if we need to move to another persistence system, we can do so simply. |


WIP to see if there are any issues here.
I recommend reviewing a commit at a time.