Skip to content

storage: fix stopper race in compactor#27699

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:fix/stopper-race
Jul 18, 2018
Merged

storage: fix stopper race in compactor#27699
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:fix/stopper-race

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jul 18, 2018

Starting workers without a surrounding task is unfortunately often not
the right thing to do when the worker accesses other state that might
become invalidated once the stopper begins to stop. In this particular
case, the compactor might end up accessing the engine even though it
had already been closed.

I wasn't able to repro this failure in the first place, but pretty sure
this:
Fixes #27232.

Release note: None

@tbg tbg requested a review from a team July 18, 2018 13:15
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jul 18, 2018 via email

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jul 18, 2018

bors r-

errcheck.

Starting workers without a surrounding task is unfortunately often not
the right thing to do when the worker accesses other state that might
become invalidated once the stopper begins to stop. In this particular
case, the compactor might end up accessing the engine even though it
had already been closed.

I wasn't able to repro this failure in the first place, but pretty sure
this:
Fixes cockroachdb#27232.

Release note: None
@tbg tbg force-pushed the fix/stopper-race branch from e592f4e to d182eff Compare July 18, 2018 15:18
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jul 18, 2018

bors r=petermattis

craig bot pushed a commit that referenced this pull request Jul 18, 2018
26362: RFC: follower reads r=bdarnell,nvanbenschoten a=tschottdorf

NB: this is extracted from #21056; please don't add new commentary on the
tech note there.

----

Follower reads are consistent reads at historical timestamps from follower
replicas. They make the non-leader replicas in a range suitable sources for
historical reads.

The key enabling technology is the propagation of **closed timestamp
heartbeats** from the range leaseholder to followers. The closed timestamp
heartbeat (CT heartbeat) is more than just a timestamp. It is a set of
conditions, that if met, guarantee that a follower replica has all state
necessary to satisfy reads at or before the CT.

Consistent historical reads are useful for analytics queries and in particular
allow such queries to be carried out more efficiently and, with appropriate
configuration, away from foreground traffic. But historical reads are also key
to a proposal for [reference-like tables](#26301) aimed at cutting
down on foreign key check latencies particularly in geo-distributed clusters;
they help recover a reasonably recent consistent snapshot of a cluster after a
loss of quorum; and they are one of the ingredients for [Change Data
Capture](#25229).

Release note: None

27699: storage: fix stopper race in compactor r=petermattis a=tschottdorf

Starting workers without a surrounding task is unfortunately often not
the right thing to do when the worker accesses other state that might
become invalidated once the stopper begins to stop. In this particular
case, the compactor might end up accessing the engine even though it
had already been closed.

I wasn't able to repro this failure in the first place, but pretty sure
this:
Fixes #27232.

Release note: None

27704: issues: fix email fallback r=petermattis a=tschottdorf

This was not my email address.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 18, 2018

Build succeeded

@craig craig bot merged commit d182eff into cockroachdb:master Jul 18, 2018
Copy link
Copy Markdown
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/compactor/compactor.go, line 112 at r1 (raw file):

	// Run the Worker in a Task because the worker holds on to the engine and
	// may still access it even though the stopper has allowed it to close.
	_ = stopper.RunTask(ctx, "compactor", func(ctx context.Context) {

This seems backwards. It should be safe to start workers without a surrounding task at startup (which this is), but those workers should create internal tasks for the work that they do. The worker-in-task pattern is only needed for workers that are created after startup.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jul 24, 2018 via email

@tbg tbg deleted the fix/stopper-race branch July 26, 2018 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

teamcity: failed tests on master: testrace/TestRaftSSTableSideloadingUpdatedReplicaID, test/TestRaftSSTableSideloadingUpdatedReplicaID

4 participants