storage: Use batches for direct RocksDB mutations by itsbilal · Pull Request #55708 · cockroachdb/cockroach

itsbilal · 2020-10-19T19:56:19Z

Currently, doing direct mutations on a RocksDB instance bypasses
custom batching / syncing logic that we've built on top of it.
This, or something internal to RocksDB, started leading to some bugs
when all direct mutations started passing in WriteOptions.sync = true
(see #55240 for when this change went in).

In this change, direct mutations still commit the batch with sync=true
to guarantee WAL syncing, but they go through the batch commit pipeline
too, just like the vast majority of operations already do.

Fixes #55362.

Release note: None.

cockroach-teamcity · 2020-10-19T19:56:27Z

This change is

petermattis

Presuming this fixes the bug, let's disable the DBImpl::{Put,Merge,Delete,SingleDelete,DeleteRange} code paths, or at least reverting the use of sync = true there.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, @petermattis, and @tbg)

pkg/storage/rocksdb.go, line 516 at r1 (raw file):

		return err
	}
	return b.Commit(true)

Let's annotate these trues with Commit(true /* sync */).

itsbilal

TFTR!

Removed the use of sync = true in those code paths. Can't remove them entirely as some tests depend on them

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens, @petermattis, and @tbg)

pkg/storage/rocksdb.go, line 516 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Let's annotate these trues with Commit(true /* sync */).

Done.

itsbilal

Re: bugfix, I've run engine/switch/nodes=3 ~50 times with no repro, and engine/switch/encrypted ~20 times. Given the latter was failing approx. 50% of the time before this change, I'm pretty confident this fixes it.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens, @petermattis, and @tbg)

petermattis

Were you able to reproduce the engine/switch/nodes=3 failure without this PR?

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @tbg)

pkg/storage/rocksdb.go, line 561 at r2 (raw file):

// It is safe to modify the contents of the arguments after ApplyBatchRepr
// returns.
func (r *RocksDB) ApplyBatchRepr(repr []byte, sync bool) error {

Can we get rid of the sync argument here? I don't think it is every used. Or perhaps we should just ignore this as we're remove RocksDB shortly anyways.

tbg · 2020-10-20T08:16:10Z

LGTM

Currently, doing direct mutations on a RocksDB instance bypasses custom batching / syncing logic that we've built on top of it. This, or something internal to RocksDB, started leading to some bugs when all direct mutations started passing in WriteOptions.sync = true (see cockroachdb#55240 for when this change went in). In this change, direct mutations still commit the batch with sync=true to guarantee WAL syncing, but they go through the batch commit pipeline too, just like the vast majority of operations already do. Fixes cockroachdb#55362. Release note: None.

itsbilal · 2020-10-20T14:39:02Z

Yes, I was able to repro it after ~50 runs. It's a lot less frequent than the engine/switch/encrypted reproduction, which was almost 1 in 2. It does seem like this fix fixes both.

itsbilal · 2020-10-20T15:18:58Z

TFTRs!

bors r+

petermattis · 2020-10-20T15:21:18Z

We'll want to backport this PR to 20.2, 20.1, and 19.2.

craig · 2020-10-20T16:47:03Z

Build succeeded:

GitHub CI (Cockroach)

itsbilal requested review from jbowens, petermattis and tbg October 19, 2020 19:56

itsbilal self-assigned this Oct 19, 2020

petermattis reviewed Oct 19, 2020

View reviewed changes

itsbilal force-pushed the rocksdb-use-batch-ops branch 2 times, most recently from 935c1cd to e0a0e7a Compare October 19, 2020 21:25

itsbilal commented Oct 19, 2020

View reviewed changes

petermattis approved these changes Oct 20, 2020

View reviewed changes

itsbilal force-pushed the rocksdb-use-batch-ops branch from e0a0e7a to 8978797 Compare October 20, 2020 14:12

This was referenced Oct 20, 2020

release-20.2: storage: Use batches for direct RocksDB mutations #55745

Merged

release-20.1: storage: Use batches for direct RocksDB mutations #55746

Merged

release-19.2: storage: Use batches for direct RocksDB mutations #55747

Closed

craig bot merged commit d58b0dc into cockroachdb:master Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Use batches for direct RocksDB mutations#55708

storage: Use batches for direct RocksDB mutations#55708
craig[bot] merged 1 commit intocockroachdb:masterfrom
itsbilal:rocksdb-use-batch-ops

itsbilal commented Oct 19, 2020

Uh oh!

cockroach-teamcity commented Oct 19, 2020

Uh oh!

petermattis left a comment

Uh oh!

itsbilal left a comment

Uh oh!

itsbilal left a comment

Uh oh!

petermattis left a comment

Uh oh!

tbg commented Oct 20, 2020

Uh oh!

itsbilal commented Oct 20, 2020

Uh oh!

itsbilal commented Oct 20, 2020

Uh oh!

petermattis commented Oct 20, 2020

Uh oh!

craig bot commented Oct 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

itsbilal commented Oct 19, 2020

Uh oh!

cockroach-teamcity commented Oct 19, 2020

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented Oct 20, 2020

Uh oh!

itsbilal commented Oct 20, 2020

Uh oh!

itsbilal commented Oct 20, 2020

Uh oh!

petermattis commented Oct 20, 2020

Uh oh!

craig bot commented Oct 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants