storage: take an engine checkpoint during failing consistency checks by tbg · Pull Request #36867 · cockroachdb/cockroach

tbg · 2019-04-16T10:43:02Z

This takes a checkpoint on the nodes with replicas of a failing range,
before the failure leads to nodes shutting down. The checkpoint will, for
the replicas of the affected range, be taken at the same Raft log position.

Release note: None

cockroach-teamcity · 2019-04-16T10:43:11Z

This change is

tbg · 2019-04-16T10:44:47Z

A failed consistency check will effectively leak the checkpoint, which I think is fine for now -- customers running into one are always supposed to (and likely to) contact support.

If we're paranoid about someone running out of disk space after not realizing that they have RocksDB checkpoints created after inconsistencies, we need to figure out a better UX but for now I'd say we add an env var that's set by default when using roachprod.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

c-deps/libroach/db.cc, line 269 at r2 (raw file):

    return ToDBStatus(status);
  }
  status = cp_ptr->CreateCheckpoint(cp_dir, 0 /* log_size_for_flush */);

consider saying something about what that 0 does, if it's interesting. I think a lot of people (like myself right now) will stop reading here and not dig into CreateCheckpoint() to see.

c-deps/libroach/db.cc, line 274 at r2 (raw file):

    return ToDBStatus(status);
  }
  return kSuccess;

can't we always return ToDBStatus(status); ?

c-deps/libroach/include/libroach.h, line 104 at r2 (raw file):

// Creates a RocksDB checkpoint in the specified directory (which must not exist).
DBStatus DBCreateCheckpoint(DBEngine* db, DBSlice dir);

pls put some words about what a checkpoint is - something about hardlinks to current SSTs

pkg/roachpb/api.proto, line 1144 at r2 (raw file):

  // If set, a checkpoint (i.e. cheap backup) of the engine will be taken. This
  // is expected to be set only if we already know that there is an
  // inconsistency and we want to preserve as much state as possible.

pls say where the checkpoint will live (how to find it)

pkg/storage/consistency_queue_test.go, line 196 at r2 (raw file):

	defer cache.Release()

	for i := 0; i < numStores; i++ {

say something about not using in-memory engines because we want to take snapshots. But actually, why can't we use in-mem engines (and still take snapshots)?

pkg/storage/replica_consistency.go, line 92 at r2 (raw file):

		// queue is asking us to report a diff, so it has run a previous check
		// which failed.
		Checkpoint: args.WithDiff && args.Mode == roachpb.ChecksumMode_CHECK_VIA_QUEUE,

this already confusing dance is becoming even more complex. How about just plumbing a checkpoint bool through args?

pkg/storage/replica_proposal.go, line 173 at r2 (raw file):

	if cc.Checkpoint {
		checkpointBase := filepath.Join(r.store.engine.GetAuxiliaryDir(), "checkpoints")
		_ = os.MkdirAll(checkpointBase, 0700)

nit: 0700 /* owner rwx /*

pkg/storage/replica_proposal.go, line 178 at r2 (raw file):

			log.Warningf(ctx, "unable to create checkpoint %s: %s", checkpointDir, err)
		} else {
			log.Infof(ctx, "created checkpoing %s", checkpointDir)

poing poing

pkg/storage/engine/engine_test.go, line 804 at r2 (raw file):

	defer cleanup()

	// TODO(tbg): use the method below, but only for on-disk stores.

sorry, what's going on here?

pkg/storage/engine/engine_test.go, line 821 at r2 (raw file):

	assert.NoError(t, err)
	assert.NoError(t, db.CreateCheckpoint(dir))

can you open and read the checkpoint you've just created? (as a RocksDB engine)

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

pkg/storage/consistency_queue_test.go, line 196 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

say something about not using in-memory engines because we want to take snapshots. But actually, why can't we use in-mem engines (and still take snapshots)?

what does engine.GetAuxiliaryDir() return for in-mem engines?

pkg/storage/replica_proposal.go, line 172 at r2 (raw file):

	snap := r.store.engine.NewSnapshot()
	if cc.Checkpoint {
		checkpointBase := filepath.Join(r.store.engine.GetAuxiliaryDir(), "checkpoints")

should we protect against in-mem engines here somehow? What's going to happen for those?

pkg/storage/replica_proposal.go, line 173 at r2 (raw file):

	if cc.Checkpoint {
		checkpointBase := filepath.Join(r.store.engine.GetAuxiliaryDir(), "checkpoints")
		_ = os.MkdirAll(checkpointBase, 0700)

why MkdirAll and not Mkdir?

We've repeatedly wanted this to preserve state when finding replica inconsistencies. See, for example: cockroachdb#36861 Release note: None

tbg

@petermattis could you give the C++ changes a glance to make sure I'm not doing something silly with memory mgmt?

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

c-deps/libroach/db.cc, line 269 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider saying something about what that 0 does, if it's interesting. I think a lot of people (like myself right now) will stop reading here and not dig into CreateCheckpoint() to see.

Done.

c-deps/libroach/db.cc, line 274 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can't we always return ToDBStatus(status); ?

Done.

c-deps/libroach/include/libroach.h, line 104 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls put some words about what a checkpoint is - something about hardlinks to current SSTs

Done.

pkg/roachpb/api.proto, line 1144 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls say where the checkpoint will live (how to find it)

Done.

pkg/storage/consistency_queue_test.go, line 196 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what does engine.GetAuxiliaryDir() return for in-mem engines?

RocksDB abstracts away the file system through an Env. An in-memory engine just has an in-memory env. So in-memory engines work if and only if you write everything through the RocksDB env, which basically means you have to slap a full POSIX API into the Engine interface. We did some of that (had to) for encryption at rest, but it's not enough. This isn't a yak worth shaving.

In-mem engines actually return a temp dir as the auxiliary dir, mostly to make things "mostly work", but that's not behavior that makes a lot of sense or that I want to rely on in this test.

pkg/storage/replica_consistency.go, line 92 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this already confusing dance is becoming even more complex. How about just plumbing a checkpoint bool through args?

Done. Agreed that this is better.

pkg/storage/engine/engine_test.go, line 804 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

sorry, what's going on here?

Removed. The comment was saying "I want to use runWithAllEngines so that pebble automatically gets this test as well, but we need a version of runWithAllEngines that uses non-mem engines". Not worth it right now. The method is in the interface, so pebble will have to learn it anyway.

pkg/storage/engine/engine_test.go, line 821 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you open and read the checkpoint you've just created? (as a RocksDB engine)

Done.

tbg

PS not sure why I LGTM'ed myself. The button looked too inviting, I guess.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

This takes a checkpoint on the nodes with replicas of a failing range, before the failure leads to nodes shutting down. The checkpoint will, for the replicas of the affected range, be taken at the same Raft log position. Release note: None

andreimatei · 2019-04-17T17:58:39Z

LGTM

tbg · 2019-04-17T17:59:38Z

TFTR!

bors r=andreimatei

craig · 2019-04-17T17:59:40Z

👎 Rejected by code reviews

done

tbg · 2019-04-17T18:00:35Z

bors r=andreimatei

andreimatei

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)

36867: storage: take an engine checkpoint during failing consistency checks r=andreimatei a=tbg This takes a checkpoint on the nodes with replicas of a failing range, before the failure leads to nodes shutting down. The checkpoint will, for the replicas of the affected range, be taken at the same Raft log position. Release note: None 36915: changefeedccl: remove deprecated changefeed.min_high_water metric r=nvanbenschoten a=danhhz It was replaced by max_behind_nanos in 19.1. Release note (backwards-incompatible change): Removed the deprecated `changefeed.min_high_water` metric; use `changefeed.max_behind_nanos` instead. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>

craig · 2019-04-17T18:40:43Z

Build succeeded

GitHub CI (Cockroach)

tbg requested a review from a team April 16, 2019 10:43

tbg requested a review from nvb April 16, 2019 10:43

nvb mentioned this pull request Apr 16, 2019

storage: consistency check failure during import #36861

Closed

3 tasks

tbg force-pushed the fix/rocks-checkpoints branch from 1d38985 to d44e47f Compare April 16, 2019 13:46

andreimatei suggested changes Apr 16, 2019

View reviewed changes

andreimatei previously requested changes Apr 16, 2019

View reviewed changes

engine: allow taking RocksDB CheckPoint

c1d8a2e

We've repeatedly wanted this to preserve state when finding replica inconsistencies. See, for example: cockroachdb#36861 Release note: None

tbg force-pushed the fix/rocks-checkpoints branch from d44e47f to 4e8d768 Compare April 17, 2019 12:22

tbg commented Apr 17, 2019

View reviewed changes

tbg force-pushed the fix/rocks-checkpoints branch from 4e8d768 to 8b1fc16 Compare April 17, 2019 17:53

andreimatei approved these changes Apr 17, 2019

View reviewed changes

craig bot merged commit 8b1fc16 into cockroachdb:master Apr 17, 2019

tbg mentioned this pull request Apr 23, 2019

backport-19.1: consistency failure debugging improvements #37029

Merged

This was referenced Oct 30, 2019

storage: assert around snapshot sending/receiving #42011

Merged

release-2.1: storage: take an engine checkpoint during failing consistency checks #42042

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: take an engine checkpoint during failing consistency checks#36867

storage: take an engine checkpoint during failing consistency checks#36867
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:fix/rocks-checkpoints

tbg commented Apr 16, 2019 •

edited

Loading

Uh oh!

cockroach-teamcity commented Apr 16, 2019

Uh oh!

tbg commented Apr 16, 2019

Uh oh!

andreimatei left a comment

Uh oh!

andreimatei left a comment

Uh oh!

tbg left a comment

Uh oh!

tbg left a comment

Uh oh!

andreimatei commented Apr 17, 2019

Uh oh!

tbg commented Apr 17, 2019

Uh oh!

craig bot commented Apr 17, 2019

Uh oh!

tbg commented Apr 17, 2019

Uh oh!

andreimatei left a comment

Uh oh!

craig bot commented Apr 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tbg commented Apr 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 16, 2019

Uh oh!

tbg commented Apr 16, 2019

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei commented Apr 17, 2019

Uh oh!

tbg commented Apr 17, 2019

Uh oh!

craig bot commented Apr 17, 2019

Uh oh!

tbg commented Apr 17, 2019

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Apr 17, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbg commented Apr 16, 2019 •

edited

Loading