Skip to content

liveness: run sync disk write in a stopper task#81813

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
erikgrinaker:liveness-disk-task
May 25, 2022
Merged

liveness: run sync disk write in a stopper task#81813
craig[bot] merged 2 commits intocockroachdb:masterfrom
erikgrinaker:liveness-disk-task

Conversation

@erikgrinaker
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker commented May 25, 2022

liveness: move stopper to NodeLivenessOptions

Release note: None

liveness: run sync disk write in a stopper task

This patch runs the sync disk write during node heartbeats in a stopper
task. The write is done in a goroutine, so that we can respect the
caller's context cancellation (even though the write itself won't).
However, this could race with engine shutdown when stopping the node,
violating the Pebble contract and triggering the race detector. Running
it as a stopper task will cause the node to wait for the disk write to
complete before closing the engine.

Of course, if the disk stalls then node shutdown will now never
complete. This is very unfortunate, since stopping the node is often
the only mitigation to recover stuck ranges with stalled disks. This is
mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes
and other orchestration tools killing the process after some time.

Touches #81786.
Resolves #81511.
Resolves #81827.

Release note: None

@erikgrinaker erikgrinaker requested a review from tbg May 25, 2022 11:31
@erikgrinaker erikgrinaker requested review from a team as code owners May 25, 2022 11:31
@erikgrinaker erikgrinaker self-assigned this May 25, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@erikgrinaker erikgrinaker changed the title Liveness disk task liveness: run sync disk write in a stopper task May 25, 2022
@stevendanna
Copy link
Copy Markdown
Collaborator

Looks like this will resolve #81827, should have looked before opening that one. Thanks!

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

CI failures appear to be unrelated flake.

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this! I suppose if we were worried about the impact of a disk stall on shutdown, we could do something where we give up on the operation after some amount of time. But, I buy the paragraph you wrote about mitigations.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @stevendanna, and @tbg)


pkg/kv/kvserver/liveness/liveness.go line 1277 at r2 (raw file):

			resultCs[i], _ = nl.engineSyncs.DoChan(strconv.Itoa(i), func() (interface{}, error) {
				var taskErr error
				if err := nl.stopper.RunTask(ctx, "liveness-hb-diskwrite", func(ctx context.Context) {

Looks like there is also RunTaskWithErr that would cut down on some of the boilerplate here, but it's fine as is.

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker, @stevendanna, and @tbg)

Copy link
Copy Markdown
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks!

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

Copy link
Copy Markdown
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we were worried about the impact of a disk stall on shutdown, we could do something where we give up on the operation after some amount of time. But, I buy the paragraph you wrote about mitigations.

Yeah, but I don't know if that's always going to be safe -- for example, in this case that means that we'll be closing Pebble while we still have in-flight writes, which is what prompted this in the first place. There's no way to cancel these writes afaik.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nicktrav, @stevendanna, and @tbg)


pkg/kv/kvserver/liveness/liveness.go line 1277 at r2 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Looks like there is also RunTaskWithErr that would cut down on some of the boilerplate here, but it's fine as is.

Oh nice, that's better -- thanks! Definitely not "fine" as is, more like functional. :)

This patch runs the sync disk write during node heartbeats in a stopper
task. The write is done in a goroutine, so that we can respect the
caller's context cancellation (even though the write itself won't).
However, this could race with engine shutdown when stopping the node,
violating the Pebble contract and triggering the race detector. Running
it as a stopper task will cause the node to wait for the disk write to
complete before closing the engine.

Of course, if the disk stalls then node shutdown will now never
complete. This is very unfortunate, since stopping the node is often
the only mitigation to recover stuck ranges with stalled disks. This is
mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes
and other orchestration tools killing the process after some time.

Release note: None
@knz
Copy link
Copy Markdown
Contributor

knz commented May 25, 2022

respect the caller's context cancellation (even though the write itself won't)

the write can't? I think go supports this via deadlines on the os.File handle.

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

respect the caller's context cancellation (even though the write itself won't)

the write can't? I think go supports this via deadlines on the os.File handle.

Yes, but that applies to all writes, not just this one. We'd also have to propagate that through Pebble. Adding deadlines for all Pebble writes is a riskier change than I want to make here.

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

TFTRs!

bors r=stevendanna,nicktrav

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 25, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: pebble panic from WriteSyncNoop kv/kvserver: TestReplicaDrainLease failed

5 participants