Skip to content

stopper: Ensure that Closers and Ctxs are always cleaned up#27920

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/fixStopper
Jul 25, 2018
Merged

stopper: Ensure that Closers and Ctxs are always cleaned up#27920
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/fixStopper

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jul 25, 2018

Before this change, the Stopper made no guarantee that Closers
registered by the AddCloser method or Contexts tied to
the Stopper with WithCancelOn{Quiesce,Stop} were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

The PR also correctly synchronizes access to the sCancels map to
fix #27908.

Release note: None

nvb added 2 commits July 25, 2018 13:24
Before this change the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

Release note: None
@nvb nvb requested a review from tbg July 25, 2018 17:30
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jul 25, 2018

bors r=benesch

craig bot pushed a commit that referenced this pull request Jul 25, 2018
27829: kv,storage: disable pipelining in merge transaction r=nvanbenschoten a=benesch

It turns out that pipelining is fundamentally incompatible with merge
transactions. The extra QueryIntent requests generated by pipelining can
get routed to a RHS that is currently blocked for merging and wedge the
merge. Consider the following sequence of events:

  1. The merge transaction begins and lays down intents on the meta2 and
     local copy of the RHS range descriptor using asynchronous commits.

  2. The RHS loses its lease.

  3. The merge transaction sends a GetSnapshotForMergeRequest. The
     pipeliner injects two QueryIntent requests, one to the meta2 range
     descriptor and one to the local range descriptor on the RHS, as
     non-transactional requests always flush the pipeline.

  4. The QueryIntent request hits the RHS and triggers a lease
     acquisition. The lease acquisition notices that the local range
     descriptor has a deletion intent, infers that a merge is in
     progress, and blocks all traffic, except for GetSnapshotForMerge
     requests.

GetSnapshotForMerge requests are allowed to proceed even when the range
is blocked for merging but QueryIntent requests are not. We can't, in
general, allow QueryIntent requests through, as we might return stale
data if the merge has already committed and the LHS is processing
new writes. We could, I think, allow QueryIntents for the local range
descriptor to proceed, but that's a very-specific corner case and hard
to reason about.

Just disable pipelining in the merge transaction to sidestep the issue.

One other option: we could allow the RHS to serve reads up until the
commit timestamp of the merge transaction. That would allow the
GetSnapshotForMerge and QueryIntent requests from the merge transaction
through, as well as any other read traffic executing beneath the merge
transaction's timestamp. The downside is that the new leaseholder would
have to wait out the full clock offset after the merge txn committed to
serve any writes. (Just like the lease stasis period.) It was decided
that this option was too disruptive and complex to pursue for the time
being.

Fix #27820.

27920: stopper: Ensure that Closers and Ctxs are always cleaned up r=benesch a=nvanbenschoten

Before this change, the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.

The PR also correctly synchronizes access to the `sCancels` map to
fix #27908.

Release note: None

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2018

Build succeeded

@craig craig bot merged commit 297a916 into cockroachdb:master Jul 25, 2018
@nvb nvb deleted the nvanbenschoten/fixStopper branch July 26, 2018 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage: TestStoreGossipSystemData failed under stress

3 participants