Skip to content

Move to a simpler form of checksum->solution caching in the remote workspace layer.#62872

Closed
CyrusNajmabadi wants to merge 39 commits intodotnet:mainfrom
CyrusNajmabadi:oopCaching3
Closed

Move to a simpler form of checksum->solution caching in the remote workspace layer.#62872
CyrusNajmabadi wants to merge 39 commits intodotnet:mainfrom
CyrusNajmabadi:oopCaching3

Conversation

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

@CyrusNajmabadi CyrusNajmabadi commented Jul 22, 2022

The main thing this improves is that currently we have a very very weird model in RemoteWorkspace where a single checksum can map both to a forked solution, but then potentially the primary workspace solution later on (once an 'UpdatePrimaryWorkspace' call comes in). The existing code did not handle this well which led to some serious confusion about which solution to use.

The new code cleanly splits the concepts of the cache for the primary workspace solutions and the cache for non-primary ones, making it easier to reason about.

@CyrusNajmabadi CyrusNajmabadi requested a review from dibarbet July 22, 2022 18:12
@ghost ghost added the Area-IDE label Jul 22, 2022
await SetLastRequestedSolutionAsync(solutionChecksum, fromPrimaryBranch, refCountedLazySolution, cancellationToken).ConfigureAwait(false);
await _anyBranchSolutionCache.SetLastRequestedSolutionAsync(solutionChecksum, refCountedLazySolution, cancellationToken).ConfigureAwait(false);
if (updatePrimaryBranch)
await _primaryBranchSolutionCache.SetLastRequestedSolutionAsync(solutionChecksum, refCountedLazySolution, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just inlined this method as i didn't see much benefit in the helper.

// If this was a notification about the primary solution, then attempt to promote any solution we found to
// be the solution for this workspace.
if (fromPrimaryBranch)
(newSolution, _) = await TryUpdateWorkspaceCurrentSolutionAsync(workspaceVersion, newSolution, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is now part of GetOrCreateSolutionAsync

{
var currentSolution = this.CurrentSolution;
// We were asked to update the primary-branch solution. So take the any-branch solution and promote it to
// the primary-branch-level.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the clever bit. we effectively ensure that whatever solution we made for this checksum gets promoted to be in the _primarySolutionCache.

What's critical to understand here is that each cache may have teh same checksum point to a different solution. The _anyBranch cache can point from that checksum to a 'forked solution' that it generated (which then has some random branch id).

The _primaryBranch cache will then instead map from the checksum to an actual primary-branch-id solution if possible (which may be a different instance than the forked solution).

Later lookups always look in the _primaryBranch cache first, so this better solution will always be found.

This was much simpler to code up and ensure proper working versus trying to have a single mapping from checksum->solution that then might have to mutate in order update what that checksum pointed at.

_gate = gate;
}

public async ValueTask<ReferenceCountedDisposable<LazySolution>?> TryFastGetSolutionAsync(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods are basically the same named methods from before, just simplified now that this cache object only stores the last accessed item.

}

public async ValueTask<ReferenceCountedDisposable<LazySolution>> SlowGetOrCreateSolutionAsync(
Checksum solutionChecksum, Func<CancellationToken, Task<Solution>> getSolutionAsync, CancellationToken cancellationToken)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as before, except that getSolutionAsync can be passed in since there is different behavior for the anyBranch cache versus the primaryBranchCache.

await _anyBranchSolutionCache.SlowGetOrCreateSolutionAsync(
solutionChecksum,
cancellationToken => ComputeSolutionAsync(assetProvider, solutionChecksum, cancellationToken),
cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: teh lambda we pass in here is the same code that used to be in SlowGetSolutionAsync at this line:

https://github.com/dotnet/roslyn/pull/62872/files#diff-fedbb1bbc0cee7c7ea4dbd51b8b27023e351ec9a062fb4623fdb8063b5517d30L271

var anyBranchSolution = await anyBranchRefCountedSolution.Target.Task.WithCancellation(cancellationToken).ConfigureAwait(false);
var (primaryBranchSolution, _) = await this.TryUpdateWorkspaceCurrentSolutionAsync(workspaceVersion, anyBranchSolution, cancellationToken).ConfigureAwait(false);
return primaryBranchSolution;
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, this lambda is different. for the primaryBranch cache we first get hte anyBranch solution, then try to promote it to be the primary branch solution for this workspace. We then cache that new solution instead for this checksum.

}

public async Task SetLastRequestedSolutionAsync(Checksum solutionChecksum, ReferenceCountedDisposable<LazySolution> solution, CancellationToken cancellationToken)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before, just simpler as there's only one 'last requested' solution to update.

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review July 22, 2022 18:25
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner July 22, 2022 18:25
@CyrusNajmabadi CyrusNajmabadi marked this pull request as draft July 22, 2022 18:29
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review July 22, 2022 18:29

var solutionChecksum = await solution.State.GetChecksumAsync(CancellationToken.None);
var synched = await remoteWorkspace.GetTestAccessor().GetSolutionAsync(assetProvider, solutionChecksum, fromPrimaryBranch: false, workspaceVersion: -1, CancellationToken.None);
var synched = await remoteWorkspace.GetTestAccessor().GetSolutionAsync(assetProvider, solutionChecksum, updatePrimaryBranch: false, workspaceVersion: -1, CancellationToken.None);
Copy link
Copy Markdown
Contributor Author

@CyrusNajmabadi CyrusNajmabadi Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed this to be clear what this flag is intending to convey. the flag it an actual request to the RemoteWorkspace to not only get the solution but update it's primary solution to point at that.

/// <summary>
/// Wrapper around asynchronously produced solution. The computation for producing the solution will be
/// canceled when the number of in-flight operations using it goes down to 0.
/// </summary>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved away from ReferenceCountedDisposable. It was getting very hard to track what was going on, and the indirections it forced added a lot of complexity around locking and updating internal state here. By using a private type for the cache, and the item being cached, we can safely and uniformly lock around everything, which enables us to perform all operations, synchronously, atomically and sanely.

public void IncrementInFlightCount_WhileAlreadyHoldingLock()
{
Contract.ThrowIfFalse(_cache._gate.CurrentCount == 0);
Contract.ThrowIfTrue(_inFlightCount < 1);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll see a ton of contract checks around the bookkeeping. I absolutely do not want to get this wrong as taht could cause:

  1. an incoming request to try to use an async-solution in teh canceled state (which would cause cancellation throws on a different token).
  2. a solution to be cached forever if we don't properly clean up after ourselves.


public void IncrementInFlightCount()
{
using (_cache._gate.DisposableWait(CancellationToken.None))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

takign this gate is never cancellable in this type. We always want the increment/decrement code (and any cleanup that causes) to always run so we're in a consistent state.

using (await _gate.DisposableWaitAsync(cancellationToken).ConfigureAwait(false))
{
// From this point on we are mutating state. Ensure we absolutely do not cancel accidentally.
cancellationToken = default;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we acquire these locks, i always null-out the cancellation token so we don't even write an errant line that causes us to cancel after we've done some mutation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not use CancellationToken.None?

@@ -3,18 +3,14 @@
// See the LICENSE file in the project root for more information.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is effectively rewritten. I recommend looking at the after side.

@@ -0,0 +1,96 @@
// Licensed to the .NET Foundation under one or more agreements.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following two types/files (ChecksumToSolutionCache and SolutionAndInFlightCount) are utility types that the main bulk of the change benefits from. I tried to keep them as clear as i coudl given that they're implementing a ref-counting system.

bool updatePrimaryBranch,
Func<Solution, ValueTask<T>> implementation,
CancellationToken cancellationToken)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the meat of the change and is very relevant.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

@dibarbet ptal.

/// intervening updates, we can cache and keep the solution around instead of having to recompute it.
/// </summary>
private Checksum? _lastRequestedChecksum;
private SolutionAndInFlightCount? _lastRequestedSolution;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we used to use the primary branch solution for correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct.

using (await _gate.DisposableWaitAsync(cancellationToken).ConfigureAwait(false))
{
// From this point on we are mutating state. Ensure we absolutely do not cancel accidentally.
cancellationToken = default;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not use CancellationToken.None?

// solution we just computed, even if we have returned. This also ensures that if we promoted a
// non-primary-solution to a primary-solution that it will now take precedence in all our caches for this
// particular checksum.
await _anyBranchSolutionCache.SetLastRequestedSolutionAsync(solutionChecksum, solution, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little suspicious of this - in that the retrieval of the last solution and the update of the last solution are not sharing the lock in the cache.

It would allow multiple requests to race between getting the solution and updating the last solution (e.g an 'older solution could overwrite the real last request solution). Maybe it doesnt matter, but it seems like the get and update to the cache should happen under a single request to the lock.

so the last solution just becomes an impl detail of the cache

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like that a lot. let me try to make that change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. change made.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

any reason to not use CancellationToken.None?

Nope. I'll switch to that. They're basically interchangeable to me since we write CancellationToken cancellationToken = default so much in parameter lists.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor Author

closing out in favor of #63028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants