Skip to content

Fix connection leaking on cancellation#26178

Merged
heejaechang merged 6 commits intodotnet:masterfrom
heejaechang:moreLog3
Apr 18, 2018
Merged

Fix connection leaking on cancellation#26178
heejaechang merged 6 commits intodotnet:masterfrom
heejaechang:moreLog3

Conversation

@heejaechang
Copy link
Copy Markdown
Contributor

@heejaechang heejaechang commented Apr 16, 2018

Customer scenario

While users are using VS. Connections to servicehub can be leaked and number of connections get increased overtime.

Bugs this fixes

#26183

Workarounds, if any

there is no workaround

Risk

I don't see any risk

Performance impact

It should remove connection leaks.

Is this a regression from a previous update?

No

Root cause analysis

when we create certain type of connection, if cancellation happens at right time, we didn't dispose connection before exception escapes current call.

How was the bug found?

dogfooding and unit testing.

@heejaechang heejaechang requested a review from a team as a code owner April 16, 2018 20:07
@heejaechang heejaechang changed the base branch from master to dev15.7.x April 16, 2018 20:07
}

#if DEBUG
private string _creationCallStack;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only debug? still seems useful in release.

if (!Environment.HasShutdownStarted)
{
Contract.Requires(false, $@"Should have been disposed!");
Contract.Requires(false, $"Should have been disposed!\r\n {_creationCallStack}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not NFW here?

…be leaked not disposed.

fixed the issue and added test
@heejaechang
Copy link
Copy Markdown
Contributor Author

@jinujoseph can we bring this to 15.7? @dotnet/roslyn-ide @dotnet/roslyn-analysis can you take a look?

return null;
}

var scope = await GetPinnedScopeAsync(solution, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Contributor Author

@heejaechang heejaechang Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue is this throwing cancellation and connection not get disposed.

@heejaechang
Copy link
Copy Markdown
Contributor Author

retest windows_debug_vs-integration_prtest please

@heejaechang
Copy link
Copy Markdown
Contributor Author

heejaechang commented Apr 17, 2018

LB issue is known one. I will take a look why sometimes LB doesn't show up.

@heejaechang heejaechang changed the title add callstack on debug to make investigation easier Fix connection leaking on cancellation Apr 17, 2018
Copy link
Copy Markdown
Contributor

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to take a closer look tomorrow.

var connection = new MyConnection(source);
try
{
await SessionWithSolution.CreateAsync(connection, solution, source.Token);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❕ Remove the try/catch here. Use Assert.ThrowsAsync if the code throws.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

// make sure things that should have been cleaned up are cleaned up
var service = (RemotableDataServiceFactory.Service)solution.Workspace.Services.GetService<IRemotableDataService>();
Assert.Null(service.GetRemotableData_TestOnly(solutionChecksum, CancellationToken.None));
Assert.True(connection.Disposed);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ why would this be true here? The ownership transfer isn't clear and the test method doesn't dispose of the object, but I'm also reviewing this on my phone so maybe I just need a better view.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whole point of this test is if it throws connection get disposed correctly.

if (!Environment.HasShutdownStarted)
{
Contract.Requires(false, $@"Should have been disposed!");
#if DEBUG
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 If this had wrapped the entire finalizer from the start, this bug would be much less serious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you meant.

#if DEBUG
Contract.Fail($"Should have been disposed!\r\n {_creationCallStack}");
#else
Contract.Fail($"Should have been disposed!");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason these are not NFWs?

Copy link
Copy Markdown
Contributor Author

@heejaechang heejaechang Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail will crash VS.

protected Connection()
{
#if DEBUG
_creationCallStack = Environment.StackTrace;
Copy link
Copy Markdown
Contributor

@CyrusNajmabadi CyrusNajmabadi Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only do this in debug? we create connections rarely, and then pool them, right? so this should be cheap enough to do even in release.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no connection is created many times. and we see this issue again we can make this release later.

@heejaechang
Copy link
Copy Markdown
Contributor Author

ping? can I get code review?

@heejaechang
Copy link
Copy Markdown
Contributor Author

@dotnet/roslyn-infrastructure
failed with 13:12:17
C:\Users\runner.nuget\packages\Microsoft.VSSDK.BuildTools\15.8.68-develop-g109a00ff\tools\VSSDK\Microsoft.VsSDK.targets(643,5): error MSB6006: "VsixUtil.exe" exited with code -1073741819. [D:\j\workspace\windows_debug---15a478d3\src\Compilers\Extension\CompilerExtension.csproj

https://ci.dot.net/job/dotnet_roslyn/job/dev15.7.x/job/windows_debug_spanish_unit32_prtest/158/

@heejaechang
Copy link
Copy Markdown
Contributor Author

retest windows_debug_spanish_unit32_prtest please

@heejaechang
Copy link
Copy Markdown
Contributor Author

retest windows_release_vs-integration_prtest please

Copy link
Copy Markdown
Contributor

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction here is fine, but there are a few more clean-up items to polish the test.

_source.Cancel();
_source.Token.ThrowIfCancellationRequested();

return Task.CompletedTask;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡

throw ExceptionUtilities.Unreachable;

public override Workspace Workspace => _workspace;
}

private class MyConnection : RemoteHostClient.Connection
Copy link
Copy Markdown
Contributor

@sharwell sharwell Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❕ Rename type to unambiguously indicate its test behavior (e.g. InvokeThrowsCancellationConnection).

{
sessionWithSolution.Dispose();
// make sure disposable objects are disposed when
// exceptions are thrown
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 This comment is unnecessary and can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tells what this test is trying to do.


var source = new CancellationTokenSource();
var connection = new MyConnection(source);
await Assert.ThrowsAnyAsync<OperationCanceledException>(() => SessionWithSolution.CreateAsync(connection, solution, source.Token));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❕ This should invoke the entry point for the error RemoteHostClientExtensions.TryCreateSessionAsync rather than using the inner helper methods from SessionWithSolution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sure why. the bug is not there. but in CreateAsync.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing of TryCreateSessionAsync would ensure that the complete sequence leading to the bug is corrected:

  1. The connection is created
  2. Something happens
  3. The connection fails to be disposed

As-written, the test does not cover the first step of the scenario. I'll unblock because it may be substantially more work to write the more complete test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I moved (2) down to CreateAsync. and nothing left in TryCreateSessionAsync. but if there is a bug, we still have Contract.Fail.

}
}

[Fact, Trait(Traits.Feature, Traits.Features.RemoteHost)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗️ Missing [WorkItem(...)]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@heejaechang
Copy link
Copy Markdown
Contributor Author

ping? need some code review! @jinujoseph

@heejaechang
Copy link
Copy Markdown
Contributor Author

retest windows_debug_vs-integration_prtest please


var source = new CancellationTokenSource();
var connection = new InvokeThrowsCancellationConnection(source);
await Assert.ThrowsAnyAsync<OperationCanceledException>(() => SessionWithSolution.CreateAsync(connection, solution, source.Token));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: assert that the resulting exception (which is returned so you can assert with it) has OperationCanceledException.CancellationToken equal to source.CancellationToken. Since there's some catching/rethrowing going on we don't want to misassociate the cancellation source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there Assert.ThrowXXX that let me get to the exception? let me take a look.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there Assert.ThrowXXX that let me get to the exception

Yes, they all return the exception instance.

@sharwell
Copy link
Copy Markdown
Contributor

@heejaechang Did you see these?

#26178 (comment)

#26178 (comment)

@heejaechang
Copy link
Copy Markdown
Contributor Author

@jinujoseph are we going to take this for 15.7?

@sharwell
Copy link
Copy Markdown
Contributor

While users are using VS. Connections to servicehub can be leaked and number of connections get increased overtime.

How is this occurring? I would expect the GC to reclaim the connections when they are no longer referenced.

@jinujoseph
Copy link
Copy Markdown
Contributor

@heejaechang , lets target this for 15.8

@heejaechang
Copy link
Copy Markdown
Contributor Author

How is this occurring? I would expect the GC to reclaim the connections when they are no longer referenced.

probably. I didn't look through servicehub's stream implementation to see whether it is safe to let go without dispose and let finalizer to take care of disposing it.

regardless, it is resource that is supposed to be disposed and I am not disposing it. so it is certainly leak.

@heejaechang
Copy link
Copy Markdown
Contributor Author

@jinujoseph do we have 15.8 branch? or master?

@jinujoseph
Copy link
Copy Markdown
Contributor

15.8 - > master

@sharwell
Copy link
Copy Markdown
Contributor

Just waiting on #26178 (comment)

@heejaechang heejaechang changed the base branch from dev15.7.x to master April 18, 2018 21:05
@sharwell
Copy link
Copy Markdown
Contributor

💡 @heejaechang It would be good to revise the Scenario and Performance Impact sections of the original template to more accurately explain the change. Specifically, this is not a performance problem caused by an increasing number of connections, but rather a reliability problem caused by an assertion that the connection is deterministically cleaned up when it is no longer in use. The simple fix of removing the [unnecessary] finalizer would address the reliability problem, but this goes a step further to address the underlying cause of non-deterministic release.

@heejaechang
Copy link
Copy Markdown
Contributor Author

since I am using external lib, I am not sure how I can guarantee I am not leaking anything when I dont dispose disposable resource. also, It will definitively not decrease ref count on shared connection. so it is not just about debug assert.

@jasonmalinowski
Copy link
Copy Markdown
Member

@heejaechang @jinujoseph I'm OK with taking the code fix in 15.8, but if we're going to do that we should get this assert disabled in 15.7 pronto, as this is one of the top flaky issues right now.

@heejaechang
Copy link
Copy Markdown
Contributor Author

@jinujoseph should I create new PR that simply remove the debug assert for 15.7?

@jinujoseph
Copy link
Copy Markdown
Contributor

yes , lets remove the assert from 15.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants