Batch up and flatten diagnostic updates. by CyrusNajmabadi · Pull Request #23377 · dotnet/roslyn

CyrusNajmabadi · 2017-11-26T00:48:12Z

This is a tweak to the diagnostic taggers to improve how they respond to a flood of notifications from the underlying diagnostic services. There are a few improvements as part of this PR.

We batch up diagnostic notifications over a 50ms window and process them all at once, instead of enqueing and processing each notification on the UI thread individually.
Because we're batching, we can do a lot of preprocessing of the notifications to throw out information that has been superceded. For example, if we get a notification that we're actually removing all the diagnostics for a provider, we can just ignore all previous diagnostic-creates that we haven't processed yet. Also, any more diagnostic creations we hear about will supercede previous diagnostic creations for that provider.

CyrusNajmabadi · 2017-11-26T00:52:14Z

tagging @heejaechang @sharwell @jasonmalinowski

CyrusNajmabadi · 2017-11-26T00:56:54Z

...tures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

+            //
+            // Because we're batching, we can optimize things such that we only store two pieces
+            // of data per provider.  Specifically, if they were removing all diagnostics, and
+            // then the last diagnostics they wanted to create.


note: it may be possible to optimize things even further. i.e. just store the last DiagnosticsUpdateArgs for a provider. However, i wasn't sure if that would end up breaking anything. i.e. it's not clear to me if there are circumstances where we depend on processing the 'remove' before the 'create'. So i wanted to make the safest change possible.

…lutions from different workspaces.

sharwell

From the information I was able to determine through heap dumps (with notable limitations due to available tooling in that area), I believe batching represents a more direct solution to the original problem identified by MemoryWatson. In other words, I have more confidence in this approach than the approach in #22920 for the specific purpose of ensuring the UI thread processing queue does not grow larger than necessary. However, the current approach in this PR has a few downsides so I'd like to look into understanding their impact and considering alternatives that address the original and new concerns.

What I like:

This pull request removes old (superseded) events from the processing queue if the UI thread has not already started handling them.

What I don't like:

This pull request substitutes the use of IForegroundNotificationService for task scheduling. The service provides an advantage due to time-boxed incremental processing. In terms of the current pull request, this is roughly equivalent to having a Yield() in the inner loop of OnDiagnosticsUpdatedOnForeground.

What I think we can improve on in the future:

I'd like to find a way to remove the Task.Delay(50). We want batching to occur when items are building in the queue, but I don't see a particular advantage to inserting a processing delay to "help" this.

sharwell · 2017-11-26T13:57:29Z

...tures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

+                        batchedUpdates.removeArgs = e;
+
+                        // Clear all all creates for this provider. 
+                        batchedUpdates.createArgs.Clear();


❗️ This logic isn't correct¹: DiagnosticsRemoved can be specific to a particular project or document. We need to do two things here:

We need to modify data structures as necessary to implement the equivalent of this:

batchedUpdates.removeArgs.UnionWith(e);

We need an efficient way to only clear items from createArgs that overlap with e (and I mean e as opposed to removeArgs).

¹ It's possible the logic is correct based on information which was not obvious to me at the time of review.

This is true. While we remove all diagnostics when we get a DiagnosticsRemoved request:

You're right. While we do remove all diagnostics when we get a DiagnosticsRemoved request:

roslyn/src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

Lines 340 to 350 in 1096194

var id = e.Id;

if (!_idToProviderAndTagger.TryGetValue(id, out var providerAndTagger))

{

// Wasn't a diagnostic source we care about.

return;

}

_idToProviderAndTagger.Remove(id);

DisconnectFromTagger(providerAndTagger.tagger);

OnUnderlyingTaggerTagsChanged(this, new SnapshotSpanEventArgs(_subjectBuffer.CurrentSnapshot.GetFullSpan()));

We only do this for a removed request that matches our document id. I need to ensure the latter.

It's possible the logic is correct based on information which was not obvious to me at the time of review.

Nope. You're correct. I missed something. Removal notifications for other documents would cause me to remove the diagnostics for our document.

sharwell · 2017-11-26T13:59:39Z

...tures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

+                        // We're creating diagnostics. This supercedes all existing creations for
+                        // this id.  However, any existing removal for this id will still happen.
+                        var document = e.Solution.GetDocument(e.DocumentId);
+                        batchedUpdates.createArgs[document] = e;


❓ Why not use DocumentId as the key here? The use of Document means entire solutions could end up rooted in the work queue.

it's rooted no matter what. e holds onto a solution. So Document was more natural here, without changing what is rooted.

CyrusNajmabadi · 2017-11-26T19:44:01Z

This pull request substitutes the use of IForegroundNotificationService for task scheduling. The service provides an advantage due to time-boxed incremental processing. In terms of the current pull request, this is roughly equivalent to having a Yield() in the inner loop of OnDiagnosticsUpdatedOnForeground.

This is true. However due to the work i've done here that time boxing-incremental-processing is almost certainly unnecessary. Due to the collapsing of work that i've done, when we wake up we will have at most two actual pieces of work to do. The removal of diagnostics and the creation of diagnostics. This is because we are attached to a single document, and so the majority of notifications we hear about will simply be thrown away immediately (in a manner of microseconds) when we compare our workspace+docId against the workspace+docId in the event.

Because we only have two pieces of work to do, doing work incrementally isn't really necessary as no single piece of work should be substantial enough to warrant yielding.

CyrusNajmabadi · 2017-11-26T19:46:28Z

I'd like to find a way to remove the Task.Delay(50). We want batching to occur when items are building in the queue, but I don't see a particular advantage to inserting a processing delay to "help" this.

Note: the FGNotificationService has a 50ms delay between processing batches of items. I was preserving that here.

Note that I'd like at least some delay that balances not being noticeable, while still likely catching a huge burst of activity from the diagnostics service. 50ms seems to fit the bill, and was not really a problem in the old system.

CyrusNajmabadi · 2017-11-26T19:47:00Z

@sharwell ready for another pass.

CyrusNajmabadi · 2017-11-26T20:04:57Z

Because we only have two pieces of work to do,

Ah no. That's not quite accurate. We have two pieces of work to do per provider that reported events for this document. @heejaechang How many different providers are there? Is there like a provider per analyzer? (so unbounded in length) Or is it more like just a small handful of providers (i.e. one for the compiler, one for all analyzers, etc.).

Thanks!

CyrusNajmabadi · 2017-11-26T20:14:27Z

At sharwell, i can add the "time-boxed incremental processing" back in if necessary. Though i'm skeptical it will be necessary. Note that very little happens when we actually do processing. All we basically do is go through the list of work (throwing out things unrelated ot our document). For work items that apply, all that happens is that the underlying AbstractDiagnosticsTaggerProvider.TaggerProvider simply stores the data and then notifies the TagSource that it changed. The TagSource then in turn does nothing but register a callback to process those tags at some delay in the future (the delay is determined by the type of the diagnostics tag).

So, basically, we do very little work on the UI thread.

CyrusNajmabadi · 2017-11-26T21:00:06Z

Lol. @sharwell not "at sharwell" :D

heejaechang · 2017-11-27T23:11:37Z

...tures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

+                    {
+                        var token = _owner._listener.BeginAsyncOperation(GetType().Name + "OnDiagnosticsUpdatedOnBackground");
+
+                        _updateTask = Task.Delay(50).ContinueWith(


what cancel previous update task if same document changed twice?

I see so this is actually batching by updating every 50ms

what cancel previous update task if same document changed twice?

No need to cancel. When the task actually executes, it will handle all the notifications that came in between now and then.

yep. I thought you were cancelling previous task and enqueue new one so that it gets delayed again. but this is just putting 50ms buffer on each update.

this still don't cancel update on tagger dispose though.

That's true. But worse case that means we just process one more request within 50 ms after tagger dispose. this request will see htat _dispose is set and will immediately bail. Basically, we have to ask ourselves how important is it to cancel a single request.

heejaechang · 2017-11-27T23:12:03Z

...tures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

            {
                this.AssertIsBackground();
-                RegisterNotification(() => OnDiagnosticsUpdatedOnForeground(e));
+                if (_disposed)


shouldn't this under lock?

No need for lock here. At worst, we miss some write of 'true' to disposed. But then the UI request that executes will immediately bail.

Note: even under a lock we could 'still' miss things, as this field is always racey if accessed from different threads. The only important thing is that UI calls respect it and don't do work if it is set to true. the BG just uses it when it can to avoid unnecessary work, but it's ok to miss.

heejaechang · 2017-11-27T23:16:49Z

...tures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs

+                    // care about.
+                    var ourDocument = _subjectBuffer.AsTextContainer().GetOpenDocumentInCurrentContext();
+                    var ourDocumentId = ourDocument?.Id;
+                    if (ourDocumentId != _currentDocumentId)


I would prefer doing this in BG by listening to all proper workspace events on buffer/workspace changes notifications. if those events are broken, we should fix those events.

We could do that as a later change. It will make things more complex as we'll have to serialize out our BG requests and filter them against the requests to update _currentDocId. i'm ok doing that as part of a followup PR.

Note: can you tell me precisely which Workspace/buffer changes I would need to listen to?

ActiveContextChange and WorkspaceChanged event.

CyrusNajmabadi · 2017-11-27T23:21:32Z

@heejaechang Can you address this part: #23377 (comment)

heejaechang · 2017-11-27T23:22:09Z

each analyzer can have 3 different events (1 syntax, 1 semantic, 1 semantic on other files). so in current design, 1 analyzer can have 3 taggers. in reality, except compiler analyzer which do all these 3, most of user analyzer produce only 1 kind of diagnostics (syntax, semantic, or 1 semantic on other files ex, compilation end analyzer) so we can assume 1 tagger per 1 analyzers most of time.

CyrusNajmabadi · 2017-11-27T23:26:36Z

Hrmm.. that doesn't really address what i'm looking for. Let me rephrase. DiagnosticUpdateArgs has an 'object Id' property in it. Normally, how many unique different Ids will there be? A fixed handful? Or is it unbounded and could be hundreds or more? i.e. Are there unique Ids per user analyzer in the system? i.e. some for Add-Using, some for Generate Method, etc. etc. Or is it more like just a couple of Ids for compiler diagnostics. Just a couple for all analyzers combined, etc. etc.

I need to know this to determine if we actually need some sort of yielding approach in the OnDiagnosticsUpdatedOnForeground loop, or if we can leave as is. If the number of DiagnosticUpdateArgs.Id's is low/bounded, we can keep as is. Otherwise, we may need to have this yield, and move to processing hte remainder of work in a later timeslice. Or, i can move this to the BG. I'll try that in another PR.

Thanks!

CyrusNajmabadi · 2017-11-27T23:37:37Z

DiagnosticUpdateArgs has an 'object Id' property in it. 1 analyzer can have at most 3 different object Id. that's why I said, in current design, 1 analyzer can have 3 taggers per a file.

That's not answering my question :) My question is: around how many unique 'Ids' are there in a normal project? Is it a handful? Or is it unbounded?

CyrusNajmabadi · 2017-11-27T23:38:21Z

but the cause of that is due to filtering on UI thread.

No, that's not true. I have filtering on the UI thread in this PR without the need for excessive lambda allocations. So there is no reason that UI thread filtering means excess allocations.

…stics

heejaechang · 2017-11-27T23:39:30Z

object id is unique per analyzer, per kind, per document/project, per vsix/nuget analyzer came from.

there will be at least as many object id as number of analyzer, documents, projects in the solution.

CyrusNajmabadi · 2017-11-27T23:40:23Z

there will be at least as many object id as number of analyzer, documents, projects in the solution.

Ok. So it's effectively unbounded. You're telling me that in something like Roslyn, there would be thousands of unique Ids, yes?

CyrusNajmabadi · 2017-11-27T23:41:01Z

object id is unique per analyzer

I don't know what that means. Are there a lot of analyzers? Or are there a fixed number. For example, do you have one analyzer that handles thigns like Add-Using/Generate-Method, or is it an analyzer per each of these?

heejaechang · 2017-11-27T23:41:10Z

you can remove the excessive lambda allocations by filtering on BG? since it is the tagger that create and enqueue that many lambda when it doesn't need to.

CyrusNajmabadi · 2017-11-27T23:41:33Z

you can remove the excessive lambda allocations by filtering on BG

And i can remove it on the FG.

since it is the tagger that create and enqueue that many lambda when it doesn't need to.

Fixed, as per this PR.

heejaechang · 2017-11-27T23:44:36Z

we don't know how many analyzers there will be. it is the diagnostic analyzer we have been making framework for. user can install as many analyzers as they wish. we do not put limit on how many analyzers they can install to VS or solution.

okay, so at most, number of object Id you can get is number of diagnostic analyzers installed (vsix or nuget) * 3 kinds (syntax, semantic, semantic on other files) * number of (documents + projects in the solution) * number of (analyzer vsix * nugets installed)

heejaechang · 2017-11-27T23:46:57Z

sure, you can remove it on FG, but once you do filtering on BG, you don't need to do it on FG since there won't be as many FG requests as before...

so, you don't need to have dictionaries to hold onto batched requests or the delay and etc.

heejaechang · 2017-11-27T23:50:49Z

we wanted to have 1 tagger for 1 analyzers. but current editor tagger doesn't allow dynamic tagger insertion/deletion once view is created. so we made this aggregating tagger concept and created this nested tagger per 1 analyzers I believe. which we can dynamically add and remove, while we give this aggregated tagger to the editor.

I am not 100% sure whether that 1 tagger per analyzer deal with all different events from that analyzer or we have 1 tagger per each different type of analyzer events (aka object id)

CyrusNajmabadi · 2017-11-28T00:16:15Z

I have a followup PR here that moves filtering to the BG as well: #23409

CyrusNajmabadi · 2017-11-28T00:24:59Z

Superseded with #23409

CyrusNajmabadi · 2017-11-28T00:44:58Z

I am not 100% sure whether that 1 tagger per analyzer deal with all different events from that analyzer or we have 1 tagger per each different type of analyzer events (aka object id)

We have one per Id. So it's whatever granularity the diagnostics service creates Ids for. We can't do any better on our end as that's all the information we're given from DiagnosticUpdateArgs

CyrusNajmabadi · 2017-11-29T19:59:41Z

@Pilchie @sharwell This is a much less risky approach to deal with the diagnostic tagging perf issue. It would probably still be good to smoke test. It would be safer to take initially, and then could be followed in a later PR by #23409

…stics

sharwell · 2017-11-29T21:00:17Z

@CyrusNajmabadi I'm more confident with #23409, and would prefer to focus the validation on that change.

CyrusNajmabadi · 2017-11-29T21:37:03Z

@sharwell That's fine. I just wanted to give you guys options depending on your risk aversion.

CyrusNajmabadi · 2018-01-16T20:54:10Z

Closing in favor of: #23448

Batch up and flatten diagnostic updates.

b6a906b

Add comment.

086260a

CyrusNajmabadi commented Nov 26, 2017

View reviewed changes

CyrusNajmabadi added 8 commits November 25, 2017 16:57

Add named args.

e69cc5a

keeping track of doc-id is insufficient as it may be shared across so…

6725955

…lutions from different workspaces.

Inline method.

a8f0068

Add comment.

0229a5c

Add disposed check back in.

dcc655e

Invert check.

8d76789

Simplify code.

5f18b8e

Convert to switch.

9f3185a

CyrusNajmabadi force-pushed the bulkProcessDiagnostics branch from e02bb7f to 9f3185a Compare November 26, 2017 04:23

sharwell suggested changes Nov 26, 2017

View reviewed changes

Keep a removal mapping as well as a creation mapping.

6b1cd81

CyrusNajmabadi added 3 commits November 26, 2017 11:53

Clean up comment.

d389184

Switch to a single dictionary keyed by a tuple.

a94826d

Use the providerId we were storing.

5d7b71e

heejaechang reviewed Nov 27, 2017

View reviewed changes

Merge remote-tracking branch 'upstream/master' into bulkProcessDiagno…

2f87194

…stics

CyrusNajmabadi mentioned this pull request Nov 28, 2017

Batch up and flatten diagnostic updates, second approach. #23409

Closed

CyrusNajmabadi closed this Nov 28, 2017

CyrusNajmabadi reopened this Nov 29, 2017

CyrusNajmabadi added 3 commits November 29, 2017 12:05

Merge remote-tracking branch 'upstream/master' into bulkProcessDiagno…

c60eb87

…stics

Simplify batching.

3851bb3

Remove unnecessary check.

14cad5d

REstore indentation.

84d6ed9

CyrusNajmabadi force-pushed the bulkProcessDiagnostics branch from cf3d1d1 to 84d6ed9 Compare November 29, 2017 21:44

Add assert back in.

327c7aa

CyrusNajmabadi mentioned this pull request Jan 11, 2018

Simplify diagnostic tagging by making it use the standard tagging model. #23448

Merged

CyrusNajmabadi closed this Jan 16, 2018

CyrusNajmabadi deleted the bulkProcessDiagnostics branch January 16, 2018 20:54

	var id = e.Id;
	if (!_idToProviderAndTagger.TryGetValue(id, out var providerAndTagger))
	{
	// Wasn't a diagnostic source we care about.
	return;
	}

	_idToProviderAndTagger.Remove(id);
	DisconnectFromTagger(providerAndTagger.tagger);

	OnUnderlyingTaggerTagsChanged(this, new SnapshotSpanEventArgs(_subjectBuffer.CurrentSnapshot.GetFullSpan()));

Conversation

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharwell left a comment

Choose a reason for hiding this comment

Uh oh!

sharwell Nov 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

CyrusNajmabadi commented Nov 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heejaechang Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heejaechang Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi commented Nov 27, 2017

Uh oh!

heejaechang commented Nov 27, 2017

Uh oh!

CyrusNajmabadi commented Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyrusNajmabadi commented Nov 27, 2017

Uh oh!

CyrusNajmabadi commented Nov 27, 2017

Uh oh!

heejaechang commented Nov 27, 2017

sharwell Nov 26, 2017 •

edited

Loading

heejaechang Nov 27, 2017 •

edited

Loading

heejaechang Nov 27, 2017 •

edited

Loading

CyrusNajmabadi commented Nov 27, 2017 •

edited

Loading

CyrusNajmabadi commented Nov 27, 2017 •

edited

Loading