Skip to content

Fix leak in Fix leak in ExternalErrorDiagnosticUpdateSource#50409

Closed
mavasani wants to merge 2 commits intodotnet:mainfrom
mavasani:FixBuildLeak
Closed

Fix leak in Fix leak in ExternalErrorDiagnosticUpdateSource#50409
mavasani wants to merge 2 commits intodotnet:mainfrom
mavasani:FixBuildLeak

Conversation

@mavasani
Copy link
Contributor

Fixes AB#1265306
Ensure we that we dispose the active cancellation source after a build is completed or cancelled

Fixes [AB#1265306](https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1265306)
Ensure we that we dispose the active cancellation source after a build is completed or cancelled
@mavasani mavasani added this to the 16.9 milestone Jan 12, 2021
@mavasani mavasani requested a review from a team as a code owner January 12, 2021 23:19
{
lock (_gate)
{
// Dispose the active cancellation source only if another build has not already started.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easier to just have a set of tokens that can all be canceled in bulk, and where the active one is removed once its work is complete? seems like less state machine management.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to have one active cancellation source at a time. A new build request should cancel the last active one to avoid unnecessary tasks from running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to have one active cancellation source at a time.

Sure. But that doesn't mean we need a literal single-field.

A new build request should cancel the last active one

A new build sholud cancel all running tasks IMO. Once a task stops running, it can remove it's CT from the set we're tracking.

to avoid unnecessary tasks from running.

Note the subtlety here. Cancelling a token doesn't stop a task from running. It just signals a request to it to stop running. The task stops running when it actually finally returns.

All i'm suggesting is (IMO at least) a simple formalization here that needs less of a mutable field, and less external changing of that field. Instead, all we'd have is a set of active tasks (or, more explicitly, a set of active cancellation tokens). When we kick of the task, we put it's CTS in that set. When the task is done (for any reason), it removes its CTS. Cancellation is the trivial, we walk all CTSs and cancel them. No need to have a single field that we then have to properly lock and keep track of and whatnot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, all we'd have is a set of active tasks (or, more explicitly, a set of active cancellation tokens).

@CyrusNajmabadi IMO your approach actually makes the code and contracts more difficult to understand. We currently have a very clearly defined mutable, gate protected state and CTS for the active build:

// Gate for concurrent access and fields guarded with this gate.
private readonly object _gate = new();
private InProgressState? _stateDoNotAccessDirectly;
private CancellationTokenSource? _activeCancellationSourceDoNotAccessDirectly;

Moving away from a single active CTS to a set of these loses the 1:1 relationship between everything tracked for current build. Additionally, it does not remove the requirement for locking and such, as we still need to track/update the current in-progress state within locks.

I am hesitant to make such a change, especially as part of this PR which just aims to fix a leak in the current design. If you still feel strongly about your proposed design, we can talk about this further in a separate issue. However, we need to get this leak resolved asap, I am getting multiple pings from perf team on getting this fixed as it is showing up in bunch of their test scenarios.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implemented a primitive for this in Git Extensions and Project System.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what Sam shows is similar to what i was thinking about. However, i would not block on this for this PR. I would ask that we storngly consider investing time and effort into large-scale refactorings here for clarity and safety. In this type, for example, things seem very fragile, and i feel there may be other design patterns that would make things better. For example roslyn has many forms of: do async, cancellable work, while moving from one immutable state to another. We shoudl consider if one of those patterns would be more viable here to encapsulate the complexity.

THanks!

Base automatically changed from master to main March 3, 2021 23:53
@mavasani mavasani modified the milestones: 16.9, 16.10 Apr 8, 2021
@mavasani
Copy link
Contributor Author

mavasani commented Apr 8, 2021

@CyrusNajmabadi for review.

Copy link
Contributor

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reimplementation of CancellationSeries, but with less clarity and encapsulation of functionality. We should just switch to using that type.

@mavasani
Copy link
Contributor Author

mavasani commented Apr 8, 2021

This is a reimplementation of CancellationSeries, but with less clarity and encapsulation of functionality. We should just switch to using that type.

@sharwell Please feel free to file a separate issue for any proposed design change. This PR aims at a simple, directed fix for a performance leak based on existing design to address reported bugs.

@mavasani mavasani dismissed sharwell’s stale review April 8, 2021 16:26

No intent of design change in this PR, it is a targeted bug fix for the current design.

Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but tagging @vatsalyaagrawal that i think we should strongly consider an up front investment in improving several of the systems around diagnostics/diagnostic-reporting. Conceptually i don't think it should be that difficult (we have a few streams of data providing us with the latest diags, and we jsut want to get the UI in sync with that). But the current impls make it very hard to maintain and comprehend.

I'm genuinely at the point that i think rewriting small portions of this would likely yield better results than trying to gingerly refactor in code with such complexity.

@mavasani
Copy link
Contributor Author

mavasani commented Apr 9, 2021

Closing in favor of #52488

@mavasani mavasani closed this Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants